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COMPUTATIONAL METHOD FOR IDENTIFYING ADHESIN AND 
ADHESIN-LIKE PROTEINS OF THERAPEUTIC POTENTIAL 

Field of the present Invention 

5 A computational method for identifying adhesin and adhesin-like proteins; computer 
system for performing the method; and genes and proteins encoding adhesin and 
adhesin-like proteins 

Bflripur.^sp^ £>~* c 7 the present Invention 

The progress in genome sequencing projects has generated a large number of inferred 

10 protein sequences from different organisms. It is expected that the availability of the 
information on the complete set of proteins from infectious human pathogens will 
enable us to develop novel molecular approaches to combat them. A necessary step in 
the successful colonization and subsequent manifestation of disease by microbial 
pathogens is the. ability to adhere to host cells. 

15 Microbial pathogens encode several proteins known as adhesins that mediate their 
adherence to host cell surface receptors, membranes, or extracellular matrix for 
successful colonization. Investigations in this primary event of host-pathogen 
interaction over the past decades have revealed a wide array of adhesins in a variety of 
pathogenic microbes. Presently, substantial information on the biogenesis of adhesins 

20 and the regulation of adhesin factors is available. One of the best understood 
mechanisms of bacterial adherence is attachment mediated by pili or fimbriae. Several 
afimbrial adhesins also have been reported. In addition, limited knowledge on the target 
host receptors also has been gained (Finlay, B.B. and Falkow, S 1997). 
New approaches to vaccine development focus on targeting adhesins to abrogate the 

25 colonization process (Wizemann, et al 1999). However, the specific role of particular 
adhesins has been difficult to elucidate. Thus, prediction of adhesins or adhesin-like 
proteins and their functional characterization is likely to aid not only in deciphering the 
molecular mechanisms of host pathogen interaction but also in developing new vaccine 
formulations, which can be tested in suitable experimental model systems. 

30 One of the best understood mechanisms of bacterial adherence is attachment mediated 
by pili or fimbriae. For example, FimH and PapG adhesins of Escherichia coli (Maurer, 
L., Orndorff, P.(1987), Bock, K., et a/.(1985). Other examples of pili group adhesins 
include type IV pili in Pseudomonas aeruginosa, Neisseria species, Moraxella species, 
Enteropathogenic Escherichia coli and Vibrio cholerae (Sperandio V et al (1996). 
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Several afimbrial adhesins are HMW proteins of Haemophilus influenzae (van 
Schilfgaarde 2000), the filamentous hemagglutinin, pertactin, of Bordetella pertussis 
(Bassinet et al 2000), the BabA of J?, pylori (Yu J et al 2002) and the YadA adhesin of 
Yersinia enterocolitica (Neubauer et al 2000). The intimin receptor protein (Tir) of 

5 Enteropathogenic E. coli (EPEC) is another type of adhesin (Ide T et al 2003). Other 
class of adhesins includes MrkD protein of Kleibsella pneumoniae, Hia of H. influenzae 
(St Geme et al 2000), Ag I/II of Streptococcus mutans and SspA, SspB of 
Streptococcus gordonii (Egland et al 2001), FnbA, FnbB of Staphylococcus aureus and 
Sfbl, protein F of Streptococcus pyogenes , the PsaA of Streptococcus pneumoniae (De 

10 et al 2003). 

A known example of adhesins approved as vaccine is the acellular pertussis vaccine 
containing FHA and pertactin against B. pertussis the causative agent of whooping 
cough (Halperin, S et al 2003). Immunization with FimH is being evaluated for 
protective immunity against pathogenic E. coli (Langermann S et al 2000), in 

15 Streptococcus pneumoniae, PsaA is being investigated as a potential vaccine candidate 
against pneumococcal disease (Rapola, S et al 2003). Immunization results with BabA 
adhesin showed promise for developing a vaccine against H. pylori (Prinz, C et al 
2003). A synthetic peptide sequence anti-adhesin vaccine is being evaluated for 
protection against Pseudomonas aeruginosa infections. 

20 Screening for adhesin and adhesin like proteins by conventional experimental method 
is laborious, time consuming and expensive. As an alternative, homology search is used 
to facilitate the identification of adhesins. Although, this procedure is useful in the 
analysis ojf genome organization (Wolf et al 2001) and of metabolic pathways 
(Peregrin- Alvarez et al 2003, Rison et al 2002), it is somewhat limited in allowing 

25 functional predictions when the homologues are not functionally characterized or the 
sequence divergence is high. Assignment of functional roles to proteins based on this 
technique has been possible for only about 60% of the predicted protein sequences 
(Fraser et al 2000). Thus, we explored the possibility of developing a non-homology 
method based on sequence composition properties combined with the power of the 

30 Artificial Neural Networks to identify adhesins and adhesin-like proteins in species 
belonging to wide phylogenetic spectrum. 

Twenty years ago, Nishikawa et al carried out some of the early attempts to classify 
proteins into different groups based on compositional analysis (Nishikawa et al 1983). 
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More recently, the software PropSearch was developed for analyzing protein sequences 
where conventional alignment tools fail to identify significantly similar sequences 
(Hobohm, U. and Sander, C 1995). PropSearch uses 144 compositional properties of 
protein sequences to detect possible structural or functional relationships between a 
5 new sequence and sequences in the database. Recently the compositional attributes of 
proteins have been used to develop softwares for predicting secretory proteins in 
bacteria and apicoplast targeted proteins in Plasmodium falciparum by training 
Artificial Neural Networks (Zuegge et al 2001). 

Zuegge et al have used the 20 amino acid compositional properties. Their objective 
10 was to extract features of apicoplast targeted proteins in Plasmodium falciparum. This 
is distinct from our software SPAAN that focuses on adhesins and adhesin-like proteins 
involved in host-pathogen interaction. 

Hobohm and Sander have used 144 compositional properties including isoelectric point 
and amino acid and dipeptide composition to generate hypotheses on putative 

15 functional role of proteins that are refractory to analysis using other sequence alignment 
based approaches like BLAST and FASTA. Hobohm and Sander do not specifically 
address the issue of adhesins and adhesin-like proteins, which is the focus of SPAAN 
Nishikawa et al had originally attempted to classify proteins into various functional 
groups. This was a curiosity driven exercise but eventually lead to the development of a 

20 software to discriminate extra-cellular proteins from intracellular proteins. This work 
did not address the issue of adhesins and adhesin-like proteins, which is the focus of 
SPAAN. 

Thus, none of the aforementioned research groups have been able to envisage the 
methodology of the instant application. The inventive method of this application 

25 provides novel proteins and corresponding gene sequences. 

Adhesins and adhesin-like proteins mediate host-pathogen interactions. This is the first 
step in colonization of a host by microbial pathogens. Attempts Worldwide are focused 
on designing vaccine formulations comprising adhesin proteins derived from 
pathogens. When immunized, host will have its immune system primed against 

30 adhesins for that pathogen. When a pathogen is actually encountered, the surveillance 
mechanism will recognize these adhesins, bind them through antigen-antibody 
interactions and neutralize the pathogen through complement mediate cascade and 
other related clearance mechanisms. This strategy has been successfully employed in 
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the case of Whooping cough and is being actively pursued in the case of Pneurrr>r.ia, 
Gastric Ulcer and Urinary tract infections. 
Objects of the present Invention 

The main object of the present invention is to provide a computational method for 
5 identifying adhesin and adhesin-like proteins of therapeutic potential. 

Another object of invention is to provide a method for screening the proteins with 
unique compositional characteristics as putative adhesins in different pathogens. 
Yet, another object of the invention is providing the use of gene sequences encoding 
the putative adhesin proteins useful as preventive therapeutics. 

10 Summary of the present Invention 

A computational method for identifying adhesin and adhesin-like proteins, said method 
comprising steps of computing the sequence-based attributes of protein sequences using 
five attribute modules of software SPAAN, (i) amino acid frequencies, (ii) multiplet 
frequency, (iii) dipeptide frequencies, (iv) charge composition, and (v) hydrophobic 

15 composition, training the artificial neural Network (ANN) for each of the computed 
five attributes, and identifying the adhesin and adhesin-like proteins having probability 
of being an adhesin (P a d) as > 0.51; a computer system for performing the method; and 
genes and proteins encoding adhesin and adhesin-like proteins 
Detailed description of the present Invention 

20 Accordingly, the present invention relates to a computational method for identifying 
adhesin and adhesin-like proteins, said method comprising steps of computing the 
sequence-based attributes of protein sequences using five attribute modules of software 
SPAAN, (i) amino acid frequencies, (ii) multiplet frequency, (iii) dipeptide frequencies, 
(iv) charge composition, and (v) hydrophobic composition, training the artificial neural 

25 Network (ANN) for each of the computed five attributes, and identifying the adhesin 
and adhesin-like proteins having probability of being an adhesin (P a d) as > 0.51; a 
computer system for performing the method; and genes and proteins encoding adhesin 
and adhesin-like proteins 

In an embodiment of the present invention, wherein the invention relates to a 
30 computational method for identifying adhesin and adhesin-like proteins, said method 
comprising steps of: 

a. computing the sequence-based attributes of protein sequences using five 
attribute modules of a neural network software, wherein the attributes 
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are software, (i) amino acid frequencies, (ii) multiplet frequency, (m) 
dipeptide frequencies, (iv) charge composition, and (v) hydrophobic 
composition, 



5 



b. 



training the artificial neural Network (ANN) for each of the computed 
five attributes, and 



c. 



identifying the adhesin and adhesin-like proteins having probability of 
being an adhesin (P^d) as > 0.51, 



In another embodiment of the present invention, wherein the invention relates to a 
method wherein the protein sequences is obtained from pathogens, eukaryotes, and 

10 multicellular organisms. 

In an embodiment of the present invention, wherein the invention relates to a method, 
wherein the protein sequences are obtained from the pathogens selected from a group 
of organisms comprising Escherichia coli 9 Haemophilus influenzae, Helicobacter 
pylori, Mycoplasma pneumoniae] Mycobacterium tuberculosis, Rickettsiae prowazekii, 

15 Porphyromonas gingivalis, Shigella flexneri, Streptococcus mutans, Streptococcus 
pneumoniae, Neisseria meningitides, Streptococcus pyogenes, Treponema pallidum and 
Severe Acute Respiratory Syndrome associated human coronavirus (SARS ). 
In yet another embodiment of the present invention, wherein the method of the 
invention is a non-homology method. 

20 In still another embodiment of the present invention, wherein the invention relates to 
the method using 105 compositional properties of the sequences. 

In still another embodiment of the present invention, wherein the invention relates to a 
method showing sensitivity of at least 90%. 

In still another embodiment of the present invention, wherein the invention relates to 
25 the method showing specificity of 100%. 

In still another embodiment of the present invention, wherein the invention relates to a 
method identifying adhesins from distantly related organisms. 

In still another embodiment of the present invention, wherein the invention relates to 
the neural network has multi-layer feed forward topology, consisting of an input layer, 
30 one hidden layer, and an output layer. 

In still another embodiment of the present invention, wherein the invention relates to 
the number of neurons in the input layer are equal to the number of input data points for 
each attribute. 
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In still another embodiment of the present invention, wherein the invention relates to 
the "Pad" is a weighted linear sum of the probabilities from five computed attributes. 
In still another embodiment of the present invention, wherein the invention relates to 
each trained network assigns a probability value of being an adhesin for the protein 
5 sequence. 

In still another embodiment of the present invention, wherein the invention relates to a 
computer system for performing the method of claim 1, said system comprising a 
central processing unit, executing SPAAN program, giving probabilities based on 
different attributes using Artificial Neural Network and in built other programs of 
10 assessing attributes, all stored in a memory device accessed by CPU, a display on 
which the central processing unit displays the screens of the above mentioned programs 
in response to user inputs; and a user interface device. 

In still another embodiment of the present invention, wherein the invention relates to a 
set of 274 annotated genes encoding adhesin and adhesin-like proteins, having SEQ ID 
15 Nos. 385 to 658. 

In still another embodiment of the present invention, wherein the invention relates to a 
set of 105 hypothetical genes encoding adhesin and adhesin-like proteins, having SEQ 
ID Nos. 659 to 763. 

In still another embodiment of the present invention, wherein the invention relates to a 
20 set of 279 annotated adhesin and adhesin-like proteins of SEQ ID Nos. 1 to 279. 

In still another embodiment of the present invention, wherein the invention relates to a 
set of 105 hypothetical adhesin and adhesin-like proteins of SEQ ID Nos. 280 to 384. 
One more embodiment of the present invention, wherein the invention also relates to a 
fully connected multilayer feed forward Artificial Neural Network based on the 
25 computational method as claimed in claim 1, comprising of an input layer, a hidden 
layer and an output layer which are connected in the said sequence, wherein each 
neuron is a binary digit number and is connected to each neuron of the subsequent layer 
for identifying adhesin or adhesin like proteins, wherein the program steps comprise:- 
[a] feeding a protein sequence in FASTA format; [b] processing the sequence 
30 obtained in step [a] through the 5 modules named A, C, D, H and M, wherein attribute 
A represents an amino acid composition, attribute C represents a charge composition, 
attribute D represents a dipeptide composition of the 20 dipeptides [NG, RE, TN, NT, 
GT, TT, DE, ER, RR, RK, RI, AT, TS, IV, SG, GS, TG, GN, VI and HR], attribute H 
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represents a hydrophobic composition and attribute M represents amino acid 
frequencies in multiplets to quantify 5 types of compositional attributes of the said 
protein sequence to obtain numerical input vectors respectively for each of the said 
attributes wherein the sum of numerical input vectors is 105; [c] processing of the 

5 numerical input vectors obtained in step [b] by the input neuron layer to obtain signals, 
wherein the number of neurons is equal to the number of numerical input vectors for 
each attribute; [d] processing of signals obtained from step [c] by the hidden layer to 
obtain synaptic weighted signals, wherein the optimal number of neurons in the hidden 
layer was determined through experimentation for minimizing the error at the best 

1 0 epoch for each network individually; [e] delivering synaptic weighted signals obtained 
from step [d] to the output layer for assigning of a probability value for each protein 
sequence fed in step [a] as being an adhesin by each network module; [f] using the 
individual probabilities obtained from step [e] for computing the final probability of a 
protein sequence being an adhesin denoted by the P a d value, which is a weighted 

15 average of the individual probabilities obtained from step [e] and the associated fraction 
of correlation which is a measure of the strength of the prediction. 

In still another embodiment of the present invention, wherein the input neuron layer 
consists of a total of 105 neurons corresponding to 105 compositional properties. 
In still another embodiment of the present invention, wherein the hidden layer 
20 comprises of neurons represented as 30 for amino acid frequencies, 28 for multiplet 
frequencies, 28 for dipeptide frequencies, 30 for charge composition and 30 for 
hydrophobic composition. 

In still another embodiment of the present invention, wherein the output layer 
comprises of neurons to deliver the output values as probability value for each protein 
25 sequence. 

Identification of novel adhesins and their characterization are important for studying 
host-pathogen interactions and testing new vaccine formulations. We have employed 
Artificial Neural Networks to develop an algorithm SPAAN (Software for Prediction of 
Adhesin and Adhesin-like proteins using Neural Networks) that can identify adhesin 
30 proteins using 105 compositional properties of a protein sequence. SPAAN could 
correctly predict well characterized adhesins from several bacterial species and strains. 
SPAAN showed 89% sensitivity and 100% specificity in a test data set that did not 
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contain proteins in the training set. Putative adhesins identified by the software can 
serve as potential preventive therapeutics. 

The present invention provides a novel computational method for identifying adhesin 
and adhesin-like proteins of therapeutic potential. More particularly, the present 
5 invention relates to candidate genes for these adhesins. The invention further provides 
new leads for development of candidate genes, and their encoded proteins in their 
functional relevance to preventive approaches. This computational method involves 
calculation of several sequence attributes and their subsequent analyses lead to the 
identification of adhesin proteins in different pathogens. Thus, the present invention is 

10 useful for identification of the adhesin proteins in pathogenic organisms. The adhesin 
proteins from different genomes constitute a set of candidates for functional 
characterization through targeted gene disruption, microarrays and proteomics. Further, 
these proteins constitute a set of candidates for further testing in development of 
preventive therapeutics. Also, are provided the genes encoding the candidate adhesin 

15 proteins. 

The present method offers novelty in the principles used and the power of Neural 
Networks to identify new adhesins compared to laborious and time consuming 
conventional methods. The present method is based on compositional properties of 
proteins instead of sequence alignments. Therefore this method has the ability to 

20 identify adhesin and adhesin like proteins from bacteria belonging to a wide 
phylogenetic spectrum. The predictions made from this method are readily verifiable 
through independent analysis and experimentation. The invention has the potential to 
accelerate the development of new preventive therapeutics, which currently requires 
high investment in terms of requirement of skilled labor and valuable time. 

25 The present invention relates to a computational method for the identification of 
candidate adhesin proteins of therapeutic potential. The invention particularly describes 
a novel method to identify adhesin proteins in different genomes of pathogens. These 
adhesin proteins can be used for developing preventive therapeutics. 
Accordingly, a computational method for identifying adhesin and adhesin-like proteins 

30 of therapeutic potential which comprises calculation of 105 compositional properties 
under the five sequence attributes, namely, Amino Acid frequency, Multiplet 
frequency, dipeptide frequency, charge composition and hydrophobic composition; and 
then training Artificial Neural Network (ANN, Feed Forward Error Back Propagation) 
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using these properties for differentiating between adhesin and non-adhesin class of 
proteins. This computational method involves quantifying 105 compositional attributes 
of query proteins and qualifying them as adhesins or non-adhesins by a P ad value 
(Probability of being an adhesin). The present invention is useful for identification of 
5 adhesin and adhesin-like proteins in pathogenic organisms. These newly identified 
adhesin and adhesin-like proteins constitute a set of candidates for development of new 
preventive therapeutics that can be tested in suitable experimental model systems 
readily. In addition, the genes encoding the candidate adhesin and adhesin-like proteins 
are provided. 

10 The invention provides a set of candidate adhesin and adhesin-like proteins and their 
coding genes for further evaluation as preventive therapeutics. The method of invention 
is based on the analysis of protein sequence attributes instead of sequence patterns 
classified to functional domains. Present method is less dependent on sequence 
relationships and therefore offers the potential power of identifying adhesins from 

15 distantly related organisms. The invention provides a computational method, which 
involves prediction of adhesin and adhesin-like proteins using Artificial Neural 
Networks. The proteins termed adhesin were found to be predicted with a high 
probability (P a d 0.51) in various pathogens. Some adhesin sequences turned out to be 
identical or homologous to proteins that are antigenic or implicated in virulence. By 

20 this approach, proteins could be identified and short-listed for further testing in 
development of new vaccine formulations to eliminate diseases caused by various 
pathogenic organisms. 
DESCRIPTION OF TABLES 
Table 1: Output file format given by SPAAN. 

25 Table 2: Organism Name, Accession number, Number of base pairs, Date of release 
and Total number of proteins. 

Table 3. Prediction of well characterized adhesins from various bacterial pathogens 
using SPAAN. 

♦ 

Table 4. Analysis of predictions made by SPAAN on genome scans of a few selected 
30 pathogenic organisms. 

Table 5: GI numbers and Gene IDs of new putative adhesins predicted by SPAAN in 
the genomes listed in Table 2. 
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Table 6: GI numbers and Gene IDs of hypothetical proteins predicted as putative 
adhesins by SPAAN in the genomes listed in Table 2. 
Table 7: The list of 198 adhesins found in bacteria 
Brief description of the accompanying drawings 
5 Figure 1 shows the Neural Network architecture 

Figure 2 shows assessment of SPAAN using defined test dataset. 

Figure 3 (») shows Histogram plots of the number of proteins in th^ various P a d value 
ranges are shown, (b) Pairwise sequence relationships among the adhesins were 
determined using CLUSTAL W and plotted on X-axis. Higher scores indicate similar 
10 pairs, (c) plot for non-adhesins. Data are plotted in the 4 quadrant format for clear 
inspection. 

Software program was written in C Language and operated on Red Hat Linux 8.0 
operating system. The computer program accepts input protein sequences in FastA 
format and produces a tabulated output. The output Table contains one row for each 
15 protein listing the probability outputs of each of the five modules, a weighted average 

♦ 

probability of these five modules (P a d)> and the function of the protein as described in 
the input sequence file. This software is called SPAAN (A Software for Prediction of 
Adhesins and Adhesin-like proteins using Neural Networks) and a software copyright 
has been filed. Although this software has multiple modules, the running of these 
20 modules have been integrated and automated. The user only needs to run one 
command. 

AAcompo.c: 

Input: File containing protein sequences in the fasta format. 

Output: File containing frequencies of all 20 AAs for each protein in one row. 

25 charge.c: 

Input: File containing protein sequences in the fasta format. 

Output: File containing frequency of charged amino acids (R, K, E and D) and 

moments (up to 18th order) of the positions of charged amino acids. 

hdr.c: 

30 Input: File containing protein sequences in the fasta format. 
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Output: File containing frequencies of 5 groups of amino acids formed on the 
basis their Hydrophobicity and moments of their positions up to 5th order. 

multiplets.c: 

Input: File containing protein sequences in the fasta format. 
5 Output: File containing fractions of multiplets of each of the 20 amino acids. 

querydfpep.c; 

Input: File. 1 containing protein sequences in the fasta format. 

File.2 containing list of the significant dipeptides in dipeptide analysis. 
Output: File containing frequencies of the dipeptides listed in the input File.2 
1 0 for each protein in the input File. 1 . 

train, c: 

Input: File containing following specifications — 

1 . Number of input and output parameters. 

2. Number of nodes in the hidden layers. 

15 3. Names of the training, validate and test data files. 

4. Learning rate, coefficient of moment. 

5. Maximum number of cycles for training. 
Output: Outputs are as follows. 

1 . Output of the trained NN for the test data set. 
20 2. Values of the weight connections in the trained NN. 

3. Some extra information about training. 

recognizee: 

Input: File containing following specifications — 

1 . Number of input and output parameters. 

25 2. Number of nodes in the hidden layers. 

3. Names of the query input file. 

4. Name of the file containing values of the weight connections for 

trained NN. 

5. Name of the output file. 

30 Output: Outputs for the query entries calculated by the trained NN. 
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standard. c: 

Input: File containing protein sequences in fasta format. 

Output: File containing protein sequences in fasta format with all the new line 
characters removed lying within a sequence. 

5 filter.c: 

Input: File containing protein sequences in fasta format. 

Output: File containing protein sequences from the input except those which 
are short in length (<50 AAs) and which contain any amino acid other than the 
20 known amino acids. 
10 The five attributes: 

Amino Acid frequencies 

Amino acid frequency £ = (counts of ith amino acid in the sequence) / 1 ; i, = 1 . . .20, 1 is 
the length of the protein. 
Multiplet frequency 

15 Multiplets are defined as homopolymeric stretches (X) n where X is any of the 20 amino 
acids and n is an integer > 2. After identifying all the multiplets, the frequencies of the 
amino acids in the multiplets were computed as 
fi(m) = (counts of i th amino acid occurring as multiplet) / 1 
Dipeptide frequencies 

20 The frequency of a dipeptide (i, j) fij = (counts of ij th dipeptide) / (total dipeptide 
counts); i, j ranges from 1 to 20. 

It has been found that dipeptide repeats in proteins are important for functional 
expression of the clumping factor present on Staphylococcus aureus cell surface that 
binds to fibrinogen (Hartford et al 1999). Thus we included the dipeptide frequency 

25 module. The total number of dipeptides is 400. For optimal training of Neural Network, 
the ratio of total number of input vectors to the total number of weight connections 
must be around 2 to avoid over fitting (Andrea et at). Therefore, we identified the 
dipeptides whose frequencies in the adhesin data set (469 proteins, see database 
construction) were significantly different from that in the non-adhesin dataset (703 

30 proteins) using t-test. The frequencies of top 20 dipeptides (when arranged in the 
descending order of the p-values of t-test), were fed to the Neural Network. These 
dipeptides were (using single letter IUPAC-IUB code) NG, RE, TN, NT, GT, TT, DE, 
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ER, RR, RK, RI, AT, TS, IV, SG, GS, TG, GN, VI, AND HR. With frequency inputs 
for 20 dipeptides and 28 neurons in the 2nd layer, the total number of weight 
connections is 588, and is in keeping with the criterion of avoiding over fitting. 
Charge composition 

5 The input frequency of charged amino acids (R, K, E and D considering the ionization 
properties of the side chains at pH 7.2) given by f c = (counts of charged amino acids) / 1 
Further, information on the characteristics of the distribution of the charged *mino 
acids in a given protein sequence was provided by computing the moments of the 
positions of the occurrences of the charged amino acids. Since moments characterize 
10 the patterns of distribution such as skewness and kurtosis (sharpness of the peak) we 
have used them to represent the distribution patterns of the charged residues in the 
sequence. 

The general expression to compute moments of a given order; say 'i' is 
M r = r order moment of the positions of charged amino acids 



Where, X m = mean of all positions of charged amino acids 
Xi ■= position of i th charged amino acid 
N = number of charged amino acids in the sequence 

J A.-L. 

The moments 2 to 1 9 order were used to train the ANN constituting a total 20 inputs 
20 in addition to frequency of charged amino acids and the length of the protein. The 

upper limit of 19 th order was set based on assessments of sensitivity and specificity on a 

small dataset of adhesins and non-adhesins. Moments of order greater than 19 were not 

useful in improvement of performance. 

Hydrophobic composition 
25 A given protein sequence was digitally transformed using the hydrophobic scores of the 

amino acids according to Brendel et ah (43). The scores for five groups of amino acids: 



(-8 for K, E, D, R), (-4 for S, T, N, Q), (-2 for P, H), (+1 for A, G, Y, C, W), (+2 for L, 
V, I, F, M). 



Following inputs were given for each of the group 
30 (a) fj = (counts of i th group) / (total counts in the protein); i ranges from 1 to 5 

(b) mji = j th order moment of positions of amino acids in i th group; j ranges from 2 to 5. 



15 



= Z 



N 
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A total of 25 inputs representing the hydrophobic composition of a protein were fed to 
the Neural Network. The rationale for using moments was same as described in the 
section on charge composition inputs. 

Taken together a total of 105 compositional properties of a given protein sequence were 
5 used to predict their adhesin characteristics. 

The software PropSearch uses 144 compositional properties of protein sequences to 
detect possible structural or functional relationships between a new sequence and 
sequences in the database (Hobohm and Sander 1995). The approach defines protein 
sequence dissimilarity (or distance) as a weighted sum of differences of compositional 

10 properties such as singlet and doublet amino acid composition, molecular weight, 
isoelectric point (protein property search or PropSearch). Compositional properties of 
proteins have also been used for predicting secretory proteins in bacteria and apicoplast 
targeted proteins in Plasmodium falciparum (Zuegge, et al. 2001). The properties used 
here are statistical methods, principal component analysis, self-organizing maps, and 

15 supervised neural networks. In SPAAN, we have used 105 compositional properties in 
the five modules viz. Amino Acid frequencies, Multiplet frequencies, Dipeptide 
frequencies, Charge composition, Hydrophobic composition. The total of 105 
properties used in SPAAN are 20 for Amino acid frequencies, 20 for Multiplets 
frequencies, 20 for Dipeptide frequencies (Top 20 significant dipeptides are used, based 

20 on t-tesi), 20 for Charge composition (frequency of charged amino acids (R, K, E and 
D) and moments of 2nd to 19th order), and 25 for Hydrophobic composition (Amino 
acids were classified into five groups (-8 for K, E, D, R), (-4 for S, T, N, Q), (-2 for P, 
H), (+1 for A, G, Y, C, W), (+2 for L, V, I, F, M). A total of 25 inputs consisted of the 
following: Frequency of each group, Moments of positions of amino acids in each 

25 group from 2nd to 5th order. 
Neural Network 

A feed forward error back propagation Neural Network was used. The program is a 
kind gift from Charles W. Anderson, Department of Computer Science, Colorado State 
University, Fort Collins, CO 80523, anderson@cs.colostate.edu 
30 Neural Network architecture 

The Neural Network used here has a multi-layer feed-forward topology. It consists of 
an input layer, one hidden layer and an output layer. This is a c fully-connected 5 Neural 
Network where each neuron i is connected to each unit j of the next layer (Figure 1). 
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The weight of each connection is denoted by Wij. The state li of each neuron in the input 
layer is assigned directly from the input data, whereas the states of hidden layer 
neurons are computed by the sigmoid function, 
hj = 1 / (1 + exp -(Wjo + Wij 10), 
5 where, w J0 is the bias weight 

The back propagation algorithm was used to minimize the differences between the 
computed output and the desired output. Ten thousand cycles (epochs) of iterations are 
performed. Subsequently, the best epoch with minimum error was identified. At this 
point the network produces approximate target values for a given input in the training 
10 set. 

A network was trained optimally for each attribute. Thus five networks were prepared. 
The schematic diagram (Figure 1) shows the procedure adopted. The number of 
neurons in the input layer was equal to the number of input data points for each 
attribute (for example 20 neurons for 20 numerical input vectors of the amino acid 
15 composition attribute). The optimal number of neurons in the hidden layer was 
determined through experimentation for minimizing the error at the best epoch for each 
network individually. An upper limit for the total number of weight connections was set 
to half of the total number of input vectors to avoid over fitting as suggested previously 
(Andrea et at). 

20 Computer programs to compute individual compositional attributes were written in C 
and executed on a PC under Red Hat Linux ver 7.3 or 8.0. The network was trained on 
the training set, checks error and optimizes using the validate set through back 
propagation. The validate set was different from the training set. Since, the number of 
well annotated adhesins were not many, we used the 'validate set' itself as test set for 

25 preliminary evaluation of the performance and to obtain the fraction of correlation to 
compute the weighted average probability (P a d value) described in the next section. The 
training set had 367 adhesins and 580 non-adhesins. The validate set had 102 adhesins 
and 123 non-adhesins. The adhesins were qualified with a digit '1' and the non-adhesins 
were qualified with a digit '0 1 . 

30 During predictions, the network is fed with new data from the sequences that were not 
part of training set. Each network assigns a probability value of being an adhesin to a 
given sequence. The final probability is computed as described in the next section. 
Probability of being an adhesin, the P a d value 
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Query proteins are processed modularly through network trained for each attribute. 
Thus, five probability outputs are obtained. Final prediction was computed using the 
following expression which is a weighted linear sum of the probabilities from five 
modules: 

5 

p ^ = (P^fc A +P C */c c + P 0 *Jcq±Ph *fc H ±Pm *fc M ) 

(fc A + fc c 4- fc D + fc H + fc M ) 

Pi = Probability from i module, 

fci = fraction of correlation of i module of the trained Neural Network, 
Where i = A (Amino acid frequencies), C (Charge composition), D (Dipeptide 
10 frequencies), H (Hydrophobic composition), or M (Multiplet frequencies). 

The fraction of correlation fci represents the fraction of total entries that were correctly 
predicted (Pi, a dhesin > 0.5 and Pi, non-adhesin < 0.5) by the trained network on the test set 
used in preliminary evaluation (Charles Anderson). 
Neural Network 

15 A feed forward error back propagation Neural Network was used. The program was 
downloaded from the web site with permission from the author, Charles W. Anderson, 
Department of Computer Science, Colorado State University, Fort Collins, CO 80523, 
anderson@cs.colostate.edu 
Statistical Analysis 

20 All statistical procedures were carried out using Microsoft Excel (Microsoft 
Corporation Inc. USA). 
Sequence analysis 

Homology analysis was carried out using CLUSTAL W (Thompson et al 1994), 
BLAST (Altschul et al 1990), CDD (conserved domain database) search (Marchler- 

25 Bauer et al 2002). 

The whole genome sequences of microbial pathogens present new opportunities for the 
development of clinical applications such as diagnostics and vaccines. The present 
invention provides new leads for the development of candidate genes, and their 
encoded proteins in their functional relevance to preventive therapeutics. 

30 The protein sequences of both the classes, i.e. adhesin and non-adhesin, were 
downloaded from the existing database (National Centre for Biotechnology Information 
(NCBI), USA). A total of 105 compositional properties under the five sequence 
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attributes namely, amino acid composition, multiplet composition, dipeptide 
composition, charge composition and hydrophobic composition were computed by 
computer programs written in C language. The attributes were computed for all the 
proteins in both the databases. The sequence-based attributes were then used to train 

5 Artificial Neural Network for each of the protein attributes. Adhesins were qualified by 
the digit T and non-adhesins were qualified by the digit c 0\ Finally each trained 
Artificial Neural Network was used to identify potential adhesins which can be 
envisaged to be useful for the development of preventive therapeutics against 
pathogenic infections. Accordingly, the invention provides a computational method for 

10 identifying adhesin and adhesin-like proteins of therapeutic potential, which comprises: 

1 . preparing two comprehensive data-sets of adhesin and non-adhesin proteins from 
publicly available information on protein sequences, 

2. calculating computationally the sequence based attributes of the protein sequences in 
the publicly available protein datasets using specially developed Software for 

15 Prediction of Adhesins and Adhesin-like proteins using Neural Networks (SPAAN), 

3 . training the Artificial Neural Network (ANN) for the selected attributes, 

4. assigning probability value suitable for an adhesin, "P ad " to the query protein and 
identifying adhesin like property in the query proteins with the help of trained Artificial 
Neural Network implemented in SPAAN, 

20 5. validating computationally the protein sequences as therapeutic potentials by 
comparing with the known protein sequences that are biochemically characterized in 
the pathogen genome. 

In an embodiment of the invention the protein sequence data may be taken from an 
organism, specifically but not limited to organisms such as Escherichia coli, 

25 Haemophilus influenzae, Helicobacter pylori, Mycoplasma pneumoniae, 
Mycobacterium tuberculosis, Rickettsiae prowazekii, Porphyromonas gingivalis, 
Shigella flexneri, Streptococcus mutans, Streptococcus pneumoniae, Neisseria 
meningitides, Streptococcus pyogenes, Treponema pallidum, Severe Acute Respiratory 
Syndrome associated coronavirus. 

30 In another embodiment to the present invention different sequence-based attributes 
used for identification of proteins of therapeutic potential, comprise amino acid 
composition, charge composition, hydrophobicity composition, multiplets frequencies, 
and dipeptide frequencies. 
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In an embodiment, the non-homologous adhesin protein sequence may be compared 
with that of known sequences of therapeutic applications in the selected pathogens. 
In an embodiment of the invention, the sequences of adhesin or adhesin like proteins 
comprise sequences of sequences IDs listed in Tables 5 and 6 identified by the method 
5 of invention. 

.Another embodiment of the invention the computer system comprises a central 
processing unit, executing SPAAN program, giving probabilities based on different 
attributes using Artificial Neural Network and in built other programs of assessing 
attributes, all stored in a memory device accessed by CPU, a display on which the 
10 central processing unit displays the screens of the above mentioned programs in 
response to user inputs; and a user interface device. 

In One embodiment of the present invention, the particulars of the organisms such as 
their name, strain, accession number in NCBI database and other details are given in 
Table 2: 

15 The invention is further explained with the help of the following examples, which are 
given by illustration and should be construed to limit the scope of the present invention 
in any manner. 
Example 1 
Operating SPAAN: 

20 The purpose of the program is to computationally calculate various sequence-based 
attributes of the protein sequences. 
The program works as follows: 

The internet downloaded FASTA format files obtained from 
http://www.ncbi.nlm.nih.gov were saved by the name <organism_name>.faa are 
25 converted in the standard format by C program and passed as input to another set of C 
programs which computes the 5 different attributes of protein sequences (a total of 105 
compositional properties in all 5 modules). 

The computed properties were fed as input to the 5 different Neural Networks. Each 
trained network assigns a probability value of being an adhesin for a query protein. The 
30 final probability (P ad ) was calculated as weighted average of these five individual 
probabilities. The weights were determined from a correlation value of correct 
prediction during test runs of each of the five modules. 
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Input/Output format: 
Downloaded Files and their format: 

<organism_name>.faa: file which stores the annotation and the protein sequence. 
Input file Format: FASTA 
5 1 ' >gi . vertline . ' 1 <annotation> 
For example, 

>gi.vertline.23 14605. vertline.gb.vertline.AAD08472.vertline.histidine and glutamine- 
rich protein 

MAHHEQQQQQQANSQHHHHHHHAHHHHYYGGEHHHHNAQQHAEQQAEQQ 

10 AQQQQQQQAHQQQQQKAQQQNQQY 

>gi. vertline.326 1 822. vertline.gnl .vertline.PID.vertline.e328405 PE_PGRS 
MIGDGANGGPGQPGGPGGLLYGNGGHGGAGAAGQDRGAGNSAGLIGNGGAG 
GAGGNGGIGGAGAPGGLGGDGGKGGFADEFTGGFAQGGRGGFGGNGNTGAS 
GGMGGAGGAGGAGGAGGLLIGDGGAGGAGGIGGAGGVGGGGGAGGTGGGG 

15 VASAFGGGNAFGGRGGDGGDGGDGGTGGAGGARGAGGAGGAGGWLSGHSG 
AHGAMGSGGEGGAGGGGGARGEAGAGGGTSTGTNPGKAGAPGTQGDSGDP 
GPPG 

>gi. vertline.. . . 

Table 1 : Output file format given by SPAAN 
20 <or gani sm_name> . out 



SN 


Pa 


Pc 


Pd 


Ph 


Pm 


P ad ~value 


Protein Name 


1 


0.05683 


0.290803 


0.441338 


0.50304 


0.029503 


0.260485 


>gi. vertline . 3 24543 44. vert 
line. gb. vertline. AAP82966 
.1. 

vertline . orfl a polyprotein 
[SARS coronavirus Hong 
Kong ZY-2003] 


2 


0.639235 


0.166721 


0.054583 


0.935385 


0.453498 


0.462452 


>gi.vertline.32454345.vert 
line.gb.vertline.AAP82967 
.1. 

vertline.orflab polyprotein 
[SARS coronavirus Hong 
Kong ZY-2003] 


3 


0.65111 
1 


0.91150 
4 


0.43869 
6 


0.54394 
4 


0.92404 
4 


0.690247 


>gi.vertline.32454346.vert 
line.gb.vertline.AAP82968 
.1. 

vertline.spike glycoprotein 
[SARS coronavirus Hong 
Kong ZY-2003] 
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>gLvert1ine.32454347.vert 
line.gb.vertline.AAP82969 
.1. 

vertline.OrBa [SARS 
coronavirus Hong Kong 
ZY-2003] 


4 


0.464324 


0.655003 


0.179503 


0.008700 


0.241573 


0.300970 



Where Pa, Pc ? Pd, Ph ? Pm are the outputs of the five Neural Networks. 
Example 2 organisms and sequence numbers 

Table 2: Organism Name, Accession number, Number of base pairs, Date of release 



5 and Total number of proteins analyzed 



Organism Name 


Accession 
Number 


Number of 
base pairs 


Date of release 


Total no. 
of proteins 


E. coli 0157 H7 


NC 0026 
95 


5498450 


7-Mar-2001 


5361 


H. influenzae Rd 


NC 0009 
07 


1830138 


30-Sep-1996 


1709 


H. pylori J99 


NC 0009 
21 


1643831 


10-Sep-2001 


1491 


M. pneumoniae 


NC 0009 
12 


816394 


2-Apr-200 1 


689 


M. tuberculosis H37Rv 


NC 0009 
62 


4411529 


7-Sep-2001 


3927 


R. prowazekii strain 
Madrid E 


NC 0009 
63 


1111523 


10-Sep-2001 


835 


P. gingivalis W83 


NC 0029 
50 


2343476 


9-Sep-2003 


1909 


S. flexneri 2a str. 2457T 


NC 0047 
41 


4599354 


23- Apr-2003 


4072 


S. mutans UA159 


NC 0043 
50 


2030921 


25-Oct-2002 


1960 


S. pneumoniae R6 


NC 0030 
98 


2038615 


6-Sep-2001 


2043 


N. meningitidis 
serogroup A strain 
Z2491 


NC 0031 
16 


2184406 


27-Sep-2001 


2065 


S. pyogenes MGAS8232 


NC 0034 
85 


1895017 


Jan 3 1 , 2002 


1845 


T. pallidum subsp. 
pallidum str. Nichols 


NC 0009 
19 


1138011 


7-Sep-2001 


1036 


Severe Acute 
Respiratory Syndrome 
(SARS) associated 
coronavirus Frankfurt 1 


AY29131 
5 


29727 


ll-JUN-2003 


14 


SARS coronavirus HSR 
1 


AY32397 
7 


29751 


15-OCT-2003 


14 
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SARS coronavirus Z JO 1 


AY29702 
8 


29715 


19-MAY-2003 


3 


SARS coronavirus TW1 


AY29145 
1 


29729 


14-MAY-2003 


11 


SARS coronavirus 
CUHK-SulO 


AY28275 
2 


29736 


07-MAY-2003 


4 


SARS coronavirus 
Urbani 


AY27874 
1 


29727 


12-AUG-2003 


12 


SARS coronavirus 


NC 0047 
18 


29751 


9-Sep-2003 


29 


SARS coronavirus Tor2 


AY27411 
9 


29751 


16-MAY-2003 


15 


SARS coronavirus GD01 


AY27848 
9 


29757 


1 8-AUG-2003 


12 


SARS coronavirus 
CUHK-W1 


AY27855 
4 


29736 


31-JUL-2003 


11 


SARS coronavirus BJ01 


AY27848 
8 


29725 


01-MAY-2003 


11 



Example 3 

The multi-layered feed forward Neural Network architecture implemented in SPAAN 
(figure 1). A given protein sequence in FASTA format is first processed through the 5 
5 modules A, C, D, H, and M to quantify the five types of compositional attributes. A: 
Amino acid composition, C: Charge composition, D: dipeptide composition of the 20 
dipeptides (NG, RE, TN 5 NT, GT, TT, DE, ER, RR, RK, RI, AT, TS, IV, SG, GS, TG, 
GN, VI, HR), H: Hydrophobic composition, M: Amino acid frequencies as Multiplets. 
The sequence shown is part of the FimH precursor (gi 5524634) of E. coli. 

10 Subsequently, these numerical data are input to the input neuron layer. The directions 
of arrows show data flow. The number of neurons chosen in the input layer was equal 
to the number of the numerical input vectors of each module. The network was 
optimally trained through minimization of error of detection based on validate set 
through back propagation. The details are described in the methods. Each network 

15 module assigns a probability value of the protein being an adhesin based on the 
corresponding attribute. The final probability of a protein sequence being an adhesin is 
the P ad value a weighted average of the individual probabilities and the associated 
fraction of correlation which is a measure of the strength of the prediction. 
Example 4 

20 Performance of SPAAN assessed using a test set of 37 adhesins and 37 non-adhesins 
that were not part of the training set. Matthew's correlation coefficient (Mcc, plotted on 
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10 



Y-axis) for all the proteins with P ad values above a given threshold (plotted on X-axis) 
(figure 2). The Matthew's correlation is defined as: 

Mcc = (TP*TN)~(FP*FN) 

VCZW + FN)(TN + FP)(TP + FN)(TP + FP) 

Where TP = True Positives, TN = True Negatives, FP = False Positives, FN = False 
Negatives. 

Here TPs are adhesins* TNs are non-adhesins. In general, adhesins have high P a d value, 
whereas non-adhesins have low P a <i value. Thus known adhesins with P ac t value above a 
given threshold are true positives whereas known non-adhesins with P a d value below 



the given threshold are true negatives. The sensitivity, Sn is given by 



f 



TP 



\TP + FN) 



and 



specificity, Sp is given by 



( TP 



{TP + FPJ 



. False negatives are those cases, wherein a 



known adhesin had P a d value lower than the chosen threshold. Similarly, a known non- 
adhesin with a P a d value higher than the chosen threshold was taken as false positive. A 
theoretical polynomial curve of second order (dashed line) was fitted to the observed 
curve (smooth line) with a Karl-Pearson correlation coefficient R = 0.9799. The 

15 maximum point of the theoretical curve (where first derivative vanishes and second 
derivative is negative) was chosen as reference (vertical dotted line) to identify the 
maximum Mcc = 0.94 on the observed curve (shown by arrow). The corresponding P a d 
value threshold was 0.51. At this P a d value threshold, Sn and Sp were 0.89 and 1.0 
respectively. Note that the Mcc does not drop down to the x-axis because the highest 

20 P a d value attained by adhesins was 0.939 in comparison to the theoretical attainable 
limit of l.O. 
Example 5 

Assessment of SPAAN on well known adhesins from various bacterial pathogens. 
Table 3. Prediction of well characterized adhesins from various bacterial pathogens 
25 using SPAAN. 



Species 


Disease 
caused 


Adhesin a 


Host ligand 


Pad value b 
(Range) 


E. coli 


Diarrhoea 


PapG (27) 
SfaS (5) 


oc-D-gal(l-4) P-D-Gal- 
containing receptors 
alpha-sialyl-beta-2,3-b- 
galactose 


0.84-0.76 
0.94-0.94 


FimH (63) 


D-mannosides 


0.96-0.23° 
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Intimin (12) 


tvrosine-Dhosnhorvlated 
form of host cell receptor 
Hp90 


0.95-0.78 


PrsG (5) 


Gal(alphal-4)Gal 


0 86-0 85 


Nontypeable H. 

influenzae 


Influenza 


HMW1, 
HMW2 


Human epithelial cells 


0.97 


Hia (8) 


human conjuctival cells 


0 93-0 90 


ti. influenzae 


bacterial 
meningitis d 


HifE (18) 


Sialylyganghoside-GM l 


0.85-0.73 


K. pneumoniae 


Pneumonia 


MrkD 


type V coll? gen 


0.8? 


B. pertussis 


Whooping 
cough 


FHA 


Sulphated sugars on cell- 
surface glycoconjugates 


0.85 


Pertactin 


Integrins 

£2 


0.43 


Y. enterocolitica 


Enterocolitis 


YadA (5) 


Pi integrins 


0.88-0.79 


S. mutans 


Dental 
Caries 


SpaP (2) 
PAc 


Salivary glycoprotein 
Salivary glycoprotein 


0.88, 0.87 
0.88 


Streptococcus 
gordonii 


Oral cavity 


SspA (2) 


Salivary glycoprotein 


0.85, 0.84 


CshA 


Fibronectin 


0.78 


CshB 


Fibronectin 


0.63 


ScaA 


Co-aggregation 


0.71 


SspB (2) 


Salivarv glvcoprotein 


0.85,0.84 


Streptococcus 
sobrinus 


Tooth decay 


SpaA 
PAg (2) 


Salivary glycoprotein 
Salivary glycoprotein 


0.89 

0.89, 0.73 


Streptococcus 


Scarlet 
Fever 


Protein F 


Fibronectin 


0.49 


Streptococcus 
pneumoniae 


Bacterial 
Pneumonia 


PsaA (5) 


Human nasonharvn^eal 
cells 


0.82-0.78 


CbnA e / 
SpsA / 
PbcA/ PspC 


phosphorylcholine of the 
teichoic acid. 


0.81-0.49 


Streptococcus 
paras an guis 


Valve 

endocarditis 


FimA 


Salivary glycoprotein fibrin 


0.76 


Streptococcus 
sanguis 


Tooth Decay 


SsaB 


Salivary glycoprotein 


0.71 


Enter o coccus 
faecalis 


Empyma in 
patients with 
liver disease 


EfaA 


Unknown 


0.83 


Staphylococcus 
aureus 


Food 
Poisoning 


FnbA 
FnbB (3) 


Fibronectin 
Fibronectin 


0.8 

0.78, 0.77, 
0.69 


Helicobacter 
pylori 


Peptic 
Ulcers 


BabA(17) 


difucosylated Lewis b blood 
group antigen 


0.87-0.68 



i 



a : The number of sequences from different strains and homologs from related species 
analyzed are shown in parantheses. 
b : Rounded off to the second decimal. 
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c : Out of 63 FimH proteins, 54 were from E. coli, 6 from Shigella flexineri, 2 from 
Salmonella enterica and 1 was from Salmonella typhimurium. Except 2 FimH proteins, 
the rest had P ad 0.51 . The 2 exceptions (gi numbers: 5524636, 1778448) were from E. 
coli. The gi:5524636 protein is annotated as a FimH precursor but is much shorter (129 
5 amino acids) than other members of the family. The gi: 1778448 protein is a S. 
typhimurium homolog in E. coli. 

d : Other ailments include pneumonia, epiglottitis, osteomyelitis, septic arthritis and 
sepsis in infants and older children. 

e : The adhesin CbpA is also known by alternative names SpsA, PbcA and PspC. A total 
10 of seven sequences were analyzed. Except 1 PspC sequence, the rest all had P a d 0.5 1 . 
Example 6 

Ability of SPAAN to discriminate adhesins from non-adhesins at Pad 0.51 (figure 3- 
a). 

Example 7 

15 The non-homology character of SPAAN assesses in both adhesins and non-adhesins 
(figure 3b and 3c). 

Figure 3 (a - c). SPAAN" is non-homology based software. A total of 130 adhesins and 
130 non-adhesins were analyzed to assess whether the predictive power of SPAAN 
could be influenced by the sequence relationships, (a) Histogram plots of the number of 

20 proteins in the various P a d value ranges are shown. Shaded bars represent adhesins 
whereas open bars represent non-adhesins. Note the SPAAN 9 s ability to segregate 
adhesins and non-adhesins into two distinct cohesive groups, (b) Pairwise sequence 
relationships among the adhesins were determined using CLUSTAL W and plotted on 
X-axis. Higher scores indicate similar pairs. The corresponding differences in Pad 

25 values in the same protein pair was plotted on the Y-axis. Each point in the diagram 
represents a pair. Arrow points to protein pairs of the FimH family with high AP ad 
values in spite of high similarity: Since one of the FimH proteins (gi: 5524636) had 
very low P a a value all pairs with this false negative protein show high AP ad values. The 
protein (gi: 5524636) is of much shorter length compared with other members of the 

30 same family, (c) plot for non-adhesins. Data are plotted in the 4 quadrant format for 
clear inspection. Note that among protein pairs with CLUSTAL W score < 20 the 
majority (82% in adhesins and 86% in non-adhesins) have AP ad < 0.2. These data 
support the non-homology character of SPAAN. 
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Example 8 

Genomescan of pathogens by SPAAN identifies well known adhesins and new 
adhesins and adhesin-like proteins 

Table 4. Analysis of predictions made by SPAAN on genome scans of a few selected 



5 pathogenic organisms 11 



^^^^ , — « m 

Species 

Protein x 
Class 


Escherichia coll 
0157:H7 


Mycobacterium 
tuberculosis H37Rv 


bAKo associaxea 
corona virus (ll 
strains) 


Total number of 
proteins with P a d 0.5 1 


575 


A C 

43 d 


c 


jSJiown aonesiiib 


1 7 b 






Putative proteins with 
adhesin like 
characteristics 


92 c 


105 J 




Hypothetical proteins 
with adhesin-like 
characteristics 


22 d 






Proteins likely to be 
extracytoplasmic or 
located at surface 


190 e 


191 K 


5 m 


Phage proteins 


30 f 






Others 


13 s 


6 1 




Hypothetical proteins 


157 h 


86 h 




Wrong predictions 


54' 


47' 


- 



a : SPAAN has general applicability. The three pathogens chosen here are those in 

which intense investigations are being conducted presently. M tuberculosis is of 

special importance to developing countries. 
10 b : Fimbrial adhesins, AidA-I, gamma intimin, curlin, translocated intimin receptor, 

putative adhesin and transport, lha, prepilin peptidase dependent protein C. 

c : These proteins have been annotated as proteins with a putative function. These 

sequences were analyzed using CDD (Conserved domain database, NCBI) and BLAST 

searches. Adhesin like domains were found in these proteins. 
15 d : These proteins have been annotated as 'hypothetical 5 . These sequences were 

analyzed using CDD and BLAST searches. Adhesin like domains were found in these 

proteins. 
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e : These proteins are outer membrane, extracellular, transport, surface, exported, 
flagellar, periplasmic lipoprotein, and proteins annotated as 'hypothetical 5 but found to 
have similar functions listed here using BLAST and CDD searches. 
f : The phage proteins were of the following functional roles - tail fiber, head 

5 decoration, DNA injection, tail, major capsid, host specificity, endolysin. 

g : Proteins predicted by SPAAN but not readily classifiable into the classes listed here 
have been collectively grouped as 'Others'. However, some of these proteins are known 
to participate in host-pathogen interactions. The annotated functional roles are typelll 
secretion, antibiotic resistance, heat shock, acid shock, structural, tellurium resistance, 

10 terminase, Hep-like, Sec-independent translocase, uncharacterized nucleoprotein, 
HicB-like. 

h : These proteins have been annotated as hypothetical Re-analyses of these proteins 
using BLAST and CDD failed identify any function for these proteins. 
{ : These proteins have been annotated with functional roles that are very likely to occur 
15 within Hie cell. Hence these proteins may have remote possibility of functioning as 
adhesins or adhesin-like proteins. Therefore this set of proteins have been incorrectly 
predicted as adhesins or adhesin-like by SPAAN. 

J : These proteins are PE_PGRS, PE proteins. Several reports (for example Brennan et 
al) indicate that PE_PGRS proteins may be localized to cell surface and aid in host- 

20 pathogen interaction. 

k : Lipoproteins (lpp, lpq, lpr), PPE, outer membrane, surface, transport, secreted, 
periplasmic, extracellular, ESAT-6, peptidoglycan binding, exported, mpt (with 
extracellular domains), and proteins annotated as 'hypothetical' but found to have 
similar functions listed here using BLAST and CDD searches. 

25 l : These proteins were of the following functions - glutaredoxin-like thioltransferase, 
putative involvement in molybdate uptake, ATP synthase chain, sulphotransferases, 
S.erythraea rhodanese-like protein M29612|SERCYSA_5, unknown function. 
m : These proteins were the spike glycoprotein with antigenic properties, and nsp2, nsp5, 
nsp6 and nsp7. 
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Table 5: New putative adhesins predicted by SPAAN in the genomes listed in table 



2- 






(Total number = 279) 




Protein GI 


Gene ID 


xiuiciix Hemic 


Number 






Escherichia 


eo/z0157:H7 




13360742 


912619 


hema^fflutiniii/hemolYsiii--related protein 


13362986 


914770 


nutative ATP-binding component of a transport system 


13361114 


913228 


nutative tail fiber protein 


13364757 


913676 


minor fimbrial subunit/D-mannose specific adhesin 


13362687 


915687 


nutative fimhrial-like Drotein 


13360856 


912599 


AidA-I adhesin-like protein 


13364140 


915374 


nutative fimbrial Drotein 


13359793 


914435 


nutative iuvasiil 

L/LtLCI-LIV y~> ill V uoiii 


13364768 


913650 


nutative invasin 


13364034 


915471 


Cram ma intimin 


13362703 


915668 


nutative DNA transfer protein precursor 


13364141 


915376 


nutative fimbrial protein 


13359819 


914463 


AidA-I adhesin-like protein 


13360480 


917768 


nutative fimbrial-like protein 


13362692 


915681 


nutative fimbrial-like protein 


13362585 


916824 


nutative ATP-bindine component of a transport system 

11 U. LCL LI V X A 1^ XX J.V-4-AXJ. V VJ" AJ.A£-r >_<A^ w Jf •/ 


13359881 


914526 


nutative flaeellin structural protein 


13361579 


917311 


mitativ^ tvnp 1 fimbrial nrotein "precursor 


13360880 


913991 


c n rl in Tnaior subunit CssA 


13364036 


915465 


traTiQlnr , ated intimin recentor Tir 

U. dllO 1VJ V^d-l-wNJ- 11 1 V 111 1 111 JLWVlw»WA 


13360740 


912615 


putative major pilin protein 


13361582 


917317 


putative ATP-binding component of a transport system and 






adhesin protein 


13364754 


913683 


export and assembly outer membrane protein of type 1 






fimbriae 


13360484 


917767 


homolog of Salmonella FimH protein 
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13364751 
13359597 
13362550 
13359595 
13359599 



913688 
913742 
916787 
913739 
913748 



13363900 
13361575 
13364756 
13363496 
13359601 
13364145 
13363902 
13361576 
13361013 
13364755 
13360738 
13363928 
13363495 
13362383 
13364373 
13360879 
13360739 
13361574 
13361127 
13363210 
13361104 
13361709 
13359725 
13360875 
13362170 
13361473 



915704 

917307 

913678 

916142 

913761 

915368 

915708 

917309 

913353 

913682 

912793 

915608 

916144 

916617 

914972 

912479 

912756 

917314 

913212 

916442 

913238 

917446 

914366 

913765 

913927 

917203 



major type 1 subunit fimbrin 
putative fimbrial protein 

putative ATP-binding component of a transport system 
putative fimbrial protein 

probable outer membrane porin protein involved in fimbrial 
assembly 

putative fimbrial protein precursor 

putative fimbrial-like protein 

fimbrial morphology 

truncated putative fimbrial protein 

putative fimbrial-like protein 

putative type 1 fimbrial protein 

putative outer membrane usher protein precursor 

putative outer membrane protein 

putative major tail subunit 

fimbrial morphology 

putative outer membrane usher protein 

alpha-amylase 

putative outer membrane protein 

putative type-1 fimbrial protein 

outer membrane vitamin B12 receptor protein BtuB 

minor curlin subunit precursor CsgB 

putative chaperone protein 

putative fimbrial-like protein 

outer membrane protease precursor 

putative lipoprotein 

major tail protein 

putative major tail subunit 

outer membrane pore protein PhoE 

curli production assembly/transport component CsgF 

putative outer membrane protein 

putative BigB-like protein 
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13364025 
13360081 



915286 
916982 



13362977 
13360351 
13360696 
13361456 
13361626 
13361698 
13362186 
13362697 
13360918 
13360737 
13360342 
13363396 



914779 
917632 
914208 
917206 
917374 
917449 
913421 
915676 
914188 
912506 
917629 
916248 



13361958 912705 



13359921 

13360944 

13359998 

13363390 

13364227 

13361982 

13360129 

13361817 

13360233 

13362837 

13362328 

Haemophilus 

16272254 

16272928 

16272129 



914566 
913890 
914644 
916251 
915153 
912846 
917032 
912692 
917507 
915218 
912985 



EspF protein 

outer membrane receptor for ferric enterobactin (enterochelin) 

and colicins B and D 

hypothetical lipoprotein 

outer membrane protein X 

putative outer membrane precursor 

putative outer membrane protein 

putative outer host membrane protein precursor 

putative outer membrane protein 

putative outer membrane protein precursor 

long-chain fatty acid transport protein FadL 

flagellar hook protein FlgE 

putative outer membrane protein 

putative outer membrane receptor for iron transport 

outer membrane channel TolC 



putative scaffolding protein in the formation of a murein- 
synthesizing holoenzyme 

nucleoside-specific channel-forming protein TSX 
outer membrane receptor for ferric iron uptake 
putative outer membrane transport protein 
putative ferrichrome iron receptor precursor 
outer membrane phospholipase A 
putative outer membrane protein 
a minor lipoprotein 
putative outer membrane protein 
membrane spanning protein TolA 
putative outer membrane lipoprotein 
putative colanic acid biosynthesis glycosyl transferase 
influenzae Rd 

94952 1 prepilin peptidase-dependent protein D 
950762 immunoglobin Al protease 
951072 lipoprotein 
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16273251 


950616 


hemoglobin-bindine protein 


30995429 


950130 


opacity protein 


16272854 


949634 


protective surface antigen D15 


16272283 


950648 


opacity associated protein 


16272604 


949701 


hemoglobin-binding protein 


Helicobacter pylori J99 




4155101 


889157 


Putative vacuolating cvtotoxin (V&gA) uaralo? 


4154798 


890022 


putative vacuo latins cvtotoxin (Vac A) paraloe 


4155426 


890036 


putative vacuolating cvtotoxin (Vac A) paraloe; 


4155390 


890075 


vacuolating cvtotoxin 


4155400 


890058 


outer membrane protein - adhesin 


4155681 


889718 


putative Outer membrane protein 


4155420 


890042 


Outer membrane protein/porin 


4155775 


889799 


outer membrane protein - adhesin 


4155419 


890044 


Outer membrane protein/porin 

Jr Jr 


4154526 


889066 


putative Outer membrane protein 


4154724 


889419 


putative Outer membrane protein 


4155862 


890404 


putative Outer membrane protein 

Jr a 


4156048 


889958 


putative IRON(III) DICITRATE TRANSPORT PROTEIN 


4154510 


889297 


putative Outer membrane protein 


4155432 


889515 


putative outer membrane protein 


4155623 


889671 


putative Outer membrane protein 


4155700 


889739 


putative Outer membrane function 


4154740 


889426 


Outer membrane protein/porin 


4155692 


889743 


putative Outer membrane protein 


4155594 


889648 


putative outer membrane protein 

JT XT 


4155680 


889719 


putative Outer membrane protein 

A A. 


4155217 


890243 


"nil I'ji'f'i vf* Outfr Tt\f^xY\)r.T^kT\f^ nrnfpin 


4155958 


889905 


putative Outer membrane protein 


4155201 


890259 


putative Outer membrane protein 


4155013 


889232 


cag island protein 


4154974 


889032 


putative Outer membrane protein 
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4155214 
4154973 
4155344 
4155099 
4155023 
4155035 



890244 
889042 
890115 
889160 
888978 
889201 



putative Outer membrane protein 
Outer membrane protein 
putative Outer membrane protein 
FLAGELLIN A 
cag island protein 

cag island protein, CYTOTOXICITY ASSOCIATED 
IMMUNODOMINANT ANTIOEN 



4155289 



890164 NEURAMINYLLACTOSE-BENDING HEMAGGLUTININ 

PRECURSOR 



Mycoplasma pneumoniae 

13507881 877207 involved in cytadherence 
13507880 877268 ADP1 MYCPN adhesin PI 



13508228 
13508181 
13508179 



877211 
877124 
877071 



13508178 877118 



13508176 876797 



13508175 



876848 



13508106 
13508350 



876953 
877112 

Mycobacterium tuberculosis H37 Rv 
15607496 886491 PPE 



species specific lipoprotein 
species specific lipoprotein 

Mollicute specific lipoprotein, MG307 homolog, from M. 
genitalium 

Mollicute specific lipoprotein, MG307 homolog, from M. 
genitalium, 

Mollicute specific lipoprotein, MG307 homolog, from M. 
genitalium 

Mollicute specific lipoprotein, MG307 homolog, from M. 
genitalium 

involved in cytadherence 
similar to phosphate binding protein Psts 



15607445 886592 



15610644 
15608588 
15609627 
15610643 
15607718 



888270 
886605 
887941 
888256 
887725 



PPE 

PE_PGRS 
PE_PGRS 
PE_PGRS 
PE_PGRS 
PE PGRS 
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15609054 


885362 


PPE 


15610486 


888113 


PPE 


15610483 


888120 


PPE 


15610479 


888033 


PPE 

-A- -■ / 


15609771 


888573 


PE PGRS 


15610648 


888306 


PE PGRS 


15610481 


888114 


PE PGRS . 


15608117 


885264 


PE PGRS 


15607973 


885391 


PEPGRS 


15608231 


885258 


PE_PGRS 


15608906 


885429 


PEJPGRS 


15608891 


885544 


PPE 


15609990 


888171 


PE_PGRS 


15609055 


885506 


PPE 


15608227 


887094 


PE PGRS 


15610524 


888151 


PE PGRS 


15609490 


886003 


PPE 


15607886 


888664 


PE_PGRS 


1 5609624 


887909 


PE_PGRS 


15607420 


886621 


PE_PGRS 


15608897 " 


885325 


PEJPGRS(wag22) 


15608590 


886595 


PE_PGRS 


15609728 


887992 


PE_PGRS 


15608012 


885742 


PE_PGRS 


15608534 


886745 


PEPGRS 


15608940 


885730 


PE_PGRS 


15607887 


888662 


PE_PGRS 


15609235 


888312 


PE_PGRS 


15610694 


887822 


PPE 


15609533 


885517 


PE_PGRS 


15610480 




PE PGRS 



Rickettsia prowaze/cii strain Madrid E 
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15604316 


883411 


CELL SURFACE ANTIGEN (sca3) 


15604546 


883694 


CELL SURFACE ANTIGEN (sca5) 


Porphyromonas gingivalis W83 


34541453 


2551934 


hemagglutinin protein HagA 


34540040 


2551409 


lipoprotein, putative 


34540364 


2552375 


extracellular protease, putative 


34541613 


2552074 


hemagglutinin protein HagE 


34540183 


2551891 


internalin-related protein 


Shigella flexneri 2a str. 2457T 


30065424 


1080663 


minor fimbrial subunit, D-mannose specific adhesin 


30062726 


1077662 


putative adhesion and penetration protein 


30063758 


1078834 


putative fimbrial-like protein 


30065431 


1080671 


major type 1 subunit fimbrin (pilin) 


30063366 


1078379 


flagellar protein FliD 


30064308 


1079668 


outer membrane fluffing protein 


30062613 


1077555 


flagellar hook protein FlgE 


30061954 


1076843 


conserved hypothetical lipoprotein 


30065173 


1080393 


putative lipase 


30065425 


1080664 


minor fimbrial subunit, precursor polypeptide 


30064485 


1079637 


putative fimbrial protein 


30062615 


1077558 


flagellar basal body L-ring protein FlgH 


30064307 


1079452 


outer membrane fluffing protein 


30065601 


1080859 


putative glycoprotein/receptor 


30062118 


1077025 


putative fimbrial-like protein 


30064099 


1079223 


lipoprotein 


30062616 


1077559 


flagellar basal body P-ring protein Flgl 


30063546 


1078596 


putative fimbrial-like protein 


30062940 


1077910 


putative outer membrane protein 

D 


30065426 


1080665 


minor fimbrial subunit, precursor polypeptide 


30062779 


1077721 


putative outer membrane protein 


30064194 


1079329 


putative lipoprotein 


30063365 


1078378 


flagellin 
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30062298 
30064968 
30061858 
30062178 
30062479 
30062565 
30063880 
30064531 
30065033 



1077222 
1080175 
1076740 
1080410 
1077412 
1077506 
1078972 
1079686 
1080243 



outer membrane protein X 
putative major fimbrial subunit 
outer membrane pore protein E (E,Ic,NmpAB) 
minor lipoprotein 
putative fimbrial-like protein 
minor curlin subunit precursor 
putative outer membrane lipoprotein 
cytoplasmic membrane protein 
putative receptor protein 
Streptococcus mutans UA159 

24378550 1029610 putative secreted antigen GbpB/SagA; putative peptidoglycan 

hydrolase 

cell surface antigen SpaP 
putative membrane protein 
penicillin-binding protein 2b 

penicillin-binding protein la; membrane carboxypeptidase 
glucan-binding protein C, GbpC 

hypothetical protein; possible cell wall protein, WapE 
putative glucan-binding protein D; BglB-like protein 
conserved hypothetical protein; possible transmembrane 
protein 

putative amino acid binding protein 

putative penicillin-binding protein, class C; fmt-like protein 
putative ABC transporter, branched chain amino acid-binding 



24379087 
24380463 
24379075 
24378955 
24379801 
24379528 
24379231 
24380488 



24380291 
24379342 
24380047 



24378708 
24379427 
24379272 
24379641 



1028055 
1029310 
1028046 
1027967 
1028662 
1029536 
1028158 
1029325 



1029139 
1028247 
1028904 



24378698 1029755 



1029768 
1028331 
1028196 
1028511 



protein 

putative ABC transporter, metal binding lipoprotein; surface 

* 

adhesin precursor; saliva-binding protein; lipoprotein receptor 

Lral (Lral family) 

putative transfer protein 

cell wall-associated protein precursor WapA 

putative amino acid transporter, amino acid-binding protein 

putative ABC transporter, amino acid binding protein 
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Streptococcus pneumoniae R6 



15902395 


934801 


Choline-binding protein 


15902381 


934810 


Choline-binding protein F 


15902165 


932894 


Surface protein pspA precursor 


15904047 


934859 


Choline binding protein D 


15904036 


933487 


Choline binding protein A 


. 15903986 


933069 


Choline-binding protein 


15903796 


933669 


Autolysin (N-acetylmuramoyl-L-alanine amidase) 


Neisseria meningitidis 


Z2491 


15794121 


907145 


putative membrane protein 


15794144 


907168 


putative surface fibril protein 


15793284 


906275 


truncated pilin 


15793460 


906456 


IgA-specific serine endopeptidase 


15793282 


906273 


fimbrial protein precursor (pilin) 


15793337 


906332 


adhesin 


15793253 


906243 


putative lipoprotein 


15794356 


907848 


putative lipoprotein 


15793684 


906699 


putative membrane protein 


15793290 


906281 


truncated pilin 


15793283 


906274 


truncated pilin 

JL 


15793475 


906471 


haemoglobin-haptoglobin-utilization protein 


15793406 


906401 


porin, major outer membrane protein P.I 


15794985 


907333 


adhesin MafA2 


15794344 


907836 


putative lipoprotein 


15794622 


908118 


hypothetical outer membrane protein 


15793599 


906604 


pilus-associated protein 


15793763 


906779 


putative periplasmic binding protein 


Streptococcus pyogenes MGAS8232 


19745214 


995235 


putative secreted protein 


19746570 


994224 


putative penicillin-binding protein 1 a 


19745593 


994771 


putative 42 kDa protein 


19745813 


993958 


putative adhesion protein 
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19745225 


994839 


putative choline binding protein 


19745828 


995250 


streptolysin S associated protein 


19746229 


995021 


putative minor tail protein 


19746909 


994105 


putative laminin adhesion 


19745560 


995061 


putative cell envelope proteinase 


JL 1 K^LJ KJ 1 HZ* 1 1 1- Li- 


pallidum subsp. pallidum str. Nichols 


15639714 


2611034 


flagellar hook protein (flgE) 


15639609 


2611657 


tpr protein J (tprJ) 


15639111 


2610909 


tpr protein C (tprC) 


15639125 


2610968 


tpr protein D (tprD) 



SARS coronavirus 

31581505 

32187357 

32187342 

30698329 

30421454 

30027620 

29836496 1489668 



30795145 
31416295 
30023954 



30275669 
29837498 



29837501 
29837503 



29837502 



spike protein S [SARS coronavirus Frankfurt 1 ] 

spike protein S [SARS coronavirus HSR 1 ] 

spike glycoprotein [SARS coronavirus ZJ0 1 J 

putative spike glycoprotein S [SARS coronavirus TW1] 

putative spike glycoprotein [SARS coronavirus CUHK-SulO] 

S protein [SARS coronavirus Urbani] 

E2 glycoprotein precursor; putative spike glycoprotein [SARJ3 
coronavirus] 

spike glycoprotein [SARS coronavirus Tor2] 

spike glycoprotein S [SARS coronavirus GD01] 

putative E2 glycoprotein precursor [SARS coronavirus 

CUHK-W1] 

spike glycoprotein S [SARS coronavirus BJ01] 

3C-like proteinase nsp5-ppla/pplab (3CL-PRO) [SAR^S 

coronavirus] 

putative nsp8-ppla/pp lab [SARS coronavirus] 
putative nsplO-ppla/pplab; formerly known as growth-factor- 
like protein [SARS coronavirus] 
putative nsp9-ppla/pp lab [SARS coronavirus] 
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Table 6: Hypothetical proteins predicted as putative adhesins by SPAAN in the 

genomes listed in table 2 — 

(Total number of proteins = 105) 

Protein GI Gene ID 

number 

Escherichia coli 0157:H7 
13363955 915578 
13360000 914929 
13362244 912369 
13359999 914888 
13361583 917316 
13361172 913156 
13361131 913207 
13359780 914422 
13360571 912499 
13362197 912893 
13362260 912399 
13360947 913505 
13361464 917196 
13361635 917367 
13362421 916655 
13361463 917195 
Haemophilus influenzae Rd 
16272115 951058 
30995442 950581 
Helicobacter pylori J99 
4155526 889586 
4155712 889748 
4155632 889684 
4156035 889468 
4155499 
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Mycoplasma pneumoniae 

13507870 877230 
13508239 877245 
13508109 876868 
13508025 877084 
13507838 876784 
13507883 877183 

13507871 877239 
13507944 877056 
13508241 876750 
13507942 877055 
13507840 877387 
13507867 877242 
13508201 877044 
13507941 876985 
13508114 877397 
Mycobacterium tuberculosis H37R.V 
15611014 886198 
15610173 887320 
15609513 885515 
15608094 885411 
15610958 886155 
15607528 886436 
15607678 887473 
15609587 885760 
15610708 887227 
15609526 885246 
15611033 886225 
15609028 885094 
15607730 887771 

15609121 885813 

15608255 885951 
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15608409 887039 



15609124 



885815 



15607734 887797 

Rickettsia prowazekii strain Madrid E 

15604649 883964 

15604322 883472 

15604659 883996 

15604417 883217 

Porphyromonas gingivalis W83 

34540233 2551594 

Shigella flexneri 2a str. 2457T 

30062687 1077638 



30062956 
30063681 
30065435 
30063891 



1080449 
1078754 
1080675 
1078983 



30063211 1078195 



30065233 
30064387 



1080463 
1079531 



30062638 1077590 



30065236 
30061839 



1080466 
1076721 



Streptococcus mutans UA159 
24378864 1029452 
24380475 1029319 



24380237 
24379203 
24380480 
24379275 
24379291 
24379295 
24379804 



1029088 
1028139 
1029320 
1029489 
1028216 
1028215 
1028663 
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24379162 
24378987 
24379179 
24379166 
24378827 
24380216 



1029417 
1029363 
1028118 
1028107 
1029444 
1029067 



Streptococcus pneumoniae R6 
15902140 932867 
15903446 934616 
15903916 934001 
15903848 933609 
15902832 934332 
15902372 934804 
15902152 932889 
Neisseria meningitidis Z2491 
15793668 906680 



15794714 



907603 



Streptococcus pyogenes MGAS8232 

19747011 993608 
19747024 994165 

19747012 994373 
19746396 995057 
19746651 993824 
19745883 995045 
19745912 994077 

Treponema pallidum subsp. pallidum str. Nichols 
15639844 2611061 
15639720 2611059 

Table 7: The list of 198 adhesins found in bacteria 



PapG (E. coli) 



12837502 
7407201 



C4 
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7407207 

7407205 

147096 

4240529 

7407203 

42308 

7443327 

78746 

18265934 

26111419 

26250987 

26109826 

26249418 

13506767 

42301 

78745 

129622 

147092 

13506906 

7407209 

147080 

281926 

7407199 

147100 

78744 

SfaS (E.coli) 

477910 

264035 

42959 

134449 

96425 

FimH (E.coli) 
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26251208 

26111640 

5524634 

29422425 

5524630 

29422435 

29422415 

10946257 

29422419 

11120564 

29422457 

11120562 

29422459 

5524632 

29422455 

29422453 

29422451 

29422449 

29422447 

29422445 

29422443 

29422437 

29422433 

29422431 

29422429 

29422427 

29422423 

29422421 

29422417 

729494 

1361011 

1790775 
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3599571 

29422441 

12620398 

29422439 

5524628 

1787779 

1742472 

1742463 

15801636 

25321294 

12515169 

11120566 

24051859 

24112911 

13360484 

15800801 

15830279 

25392018 

25500156 

12514120 

1787173 

16128908 

16501811 

16759519 

24051219 

24112354 

30040724 

30062478 

6650093 

5524636 

1778448 



Intimin (E.coli) 
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1 7384659 

4388530 

1389879 

15723931 

4323336 

4323338 

4323340 

4323342 

4323344 

4323346 

4323348 

4689314 

PrsG (E.coli) 

42523 

42529 

7443328 

7443329 

1172645 

HMW1 (Nontypeable H. influenzae) 

282097 

HMW2 (Nontypeable H. influenzae) 

5929966 

Hia (Nontypeable H. influenzae) 

25359682 
25359489 
25359709 
25359628 
25359414 
25359389 
21536216 
25359445 

HifE (H. influenzae) 



WO 2005/076010 



MrkD (K. pneumoniae) 
FHA (B. pertussis) 
Pertactin (B. pertussis) 
YadA (Y. enterocolitica) 



SpaP (S. mutans) 
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13506868 

13506870 

13506872 

13506874 

13506876 

3688787 

3688790 

3688793 

2126301 

1170264 

1170265 

533127 

535169 

3025668 

3025670 

3025672 

3025674 

642038 

127307 

17154501 

33571840 

10955604 

4324391 

28372996 

23630568 

32470319 

26007028 
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PAc (S. mutans) 

SspA (Streptococcus gordonii) 



CsliA (Streptococcus gordonii) 
CshB (Streptococcus gordonii) 



ScaA (Streptococcus gordonii) 
SspB (Streptococcus gordonii) 



SpaA (Streptococcus sobrinus) 
PAg (Streptococcus sobrinus) 



Protein F (Streptococcus pyogenes) 
PsaA (Streptococcus pneumoniae) 



CbpA e / SpsA / PbcA/ PspC 
(Streptococcus pneumoniae) 



PCT/IN2005/000037 

46 

47267 

129552 

25990270 
1100971 

457707 

18389220 

310633 

25055226 
3220006 

546643 

217036 
47561 

19224134 

18252614 

7920456 

7920458 

7920460 

7920462 



14718654 
2425109 
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FimA (Streptococcus parasanguis) 
SsaB (Streptococcus sanguis) 
EfaA (Enterococcus faecalis) 
FnbA (Staphylococcus aureus) 
FnbB (Staphylococcus aureus) 



BabA (Helicobacter pylori) 
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2576331 
2576333 
3153898 
9845483 
19548141 

97883 

97882 

493017 

120457 

581562 

21205592 

13702452 

13309962 
13309964 
13309966 
13309968 
13309970 
13309972 
13309974 
13309976 
13309978 
13309980 
13309982 
13309984 
13309986 
13309988 
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13309990 
13309992 
• 13309994 

Advantages: 

1. The method helps in discovering putative adhesins, which are of great 
importance in drug discoveries and preventive therapeutics. 

2. The method is useful in predicting the adhesive nature of even unique proteins, 
5 because it is independent of the homology of the query proteins with other 

proteins. 

3. This method is easy to use. For calculating the output, only the amino acid 
sequence is required as input. No other information is required to get the 
information about its adhesive nature. 
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Claims 

1 . A computational method for identifying adhesin and adhesin-like proteins, said 
method comprising steps of: 

a. computing the sequence-based attributes of protein sequences using five 
5 attribute modules of a neural network software, wherein the attributes 

are, (i) amino acid frequencies, (ii) multiplet frequency, (iii) dipeptide 
frequencies, (i v) charge composition, and (v) hydrophobic composition, 

b. training a artificial neural Network (ANN) for each of the computed five 
attributes, and 

10 c. identifying the adhesin and adhesin-like proteins having probability of 

being an adhesin (P ad ) as > 0.51. 

2. A method as claimed in claim 1, wherein the protein sequences are obtained 
from pathogens, eukaryotes, and multicellular organisms. 

3. A method as claimed in claim 1, wherein the protein sequences are obtained 
15 from the pathogens selected from a group of organisms comprising Escherichia 

coli y Haemophilus influenzae, Helicobacter pylori, Mycoplasma pneumoniae, 
Mycobacterium tuberculosis, Rickettsiae prowazekii, Porphyromonas 
gingivalis, Shigella flexneri, Streptococcus mutans, Streptococcus pneumoniae, 
Neisseria meningitides, Streptococcus pyogenes, Treponema pallidum and 
20 Severe Acute Respiratory Syndrome associated human coronavirus (SARS ). 

4. A method as claimed in claim 1, wherein the method is a non-homology 
method. 

5. A method as claimed in claim 1, wherein the method uses 105 compositional 
properties of the sequences. 

A method as claimed in claim 1 , wherein the method shows sensitivity of at 
least 90%. 

7. A method as claimed in claim 1, wherein the method shows specificity of 
100%. 

8. A method as claimed in claim 1, wherein the method helps identifies adhesins 
30 from distantly related organisms. 

9. A method as claimed in claim 1, wherein the neural network has multi-layer 
feed forward topology, consisting of an input layer, one hidden layer, and an 
output layer. 



25 6. 
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A method as claimed in claim 9, wherein the number of neurons in the input 
layer are equal to the number of input data points for each attribute. 
A method as claimed in claim 1, wherein the "P a d" is a weighted linear sum of 
the probabilities from five computed attributes. 

A method as claimed in claim 1, wherein each trained network assigns a 
probability value of being an adhesin for the protein sequence. 
A computer system for performing the method of claim 1, said system 
comprising a central processing unit, executing SPAAN program, giving 
probabilities based on different attributes using Artificial Neural Network and in 
built other programs of assessing attributes, all stored in a memory device 
accessed by CPU, a display on which the central processing unit displays the 
screens of the above mentioned programs in response to user inputs; and a user 
interface device. 

A set of 274 annotated genes encoding adhesin and adhesin-like proteins, 
having SEQ ID Nos. 385 to 658. 

A set of 105 hypothetical genes encoding adhesin and adhesin-like proteins, 
having SEQ ID Nos. 659 to 763. 

A set of 279 annotated adhesin and adhesin-like proteins of SEQ ID Nos. 1 to 
279. 

A set of 105 hypothetical adhesin and adhesin-like proteins of SEQ ID Nos. 280 
to 384. 

A fully connected multilayer feed forward Artificial Neural Network based on 
the computational method as claimed in claim 1 , comprising of an input layer, a 
hidden layer and an output layer which are connected in the said sequence, 
wherein each neuron is a binary digit number and is connected to each neuron 
of the subsequent layer for identifying adhesin or adhesin like proteins, wherein 
the program steps comprise:- 

[a] feeding a protein sequence in FASTA format; 

[b] processing the sequence obtained in step [a] through the 5 modules 
named A, C, D, H and M, wherein attribute A represents an amino acid 
composition, attribute C represents a charge composition, attribute D 
represents a dipeptide composition of the 20 dipeptides [NG, RE, TN, 
NT, GT, TT, DE, ER, RR, RK, RI, AT, TS, IV, SG, GS, TG, GN, VI 
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and HR], attribute H represents a hydrophobic composition and attribute 
M represents amino acid frequencies in multiplets to quantify 5 types of 
compositional attributes of the said protein sequence to obtain numerical 
input vectors respectively for each of the said attributes wherein the sum 
of numerical input vectors is 105; 

[c] processing of the numerical input vectors obtained in step [b] by the 
input neuron layer to obtain signals, wherein the number of neurons is 
equal to the number of numerical input vectors for each attribute; 

[d] processing of signals obtained from step [c] by the hidden layer to obtain 
synaptic weighted signals, wherein the optimal number of neurons in the 
hidden layer was determined through experimentation for minimizing 
the error at the best epoch for each network individually; 

[e] delivering synaptic weighted signals obtained from step [d] to the output 
layer for assigning of a probability value for each protein sequence fed 
in step [a] as being an adhesin by each network module; and 

[f| using the individual probabilities obtained from step [e] for computing 
the final probability of a protein sequence being an adhesin denoted by 
the P ad value, which is a weighted average of the individual probabilities 
obtained from step [e] and the associated fraction of correlation which is 
a measure of the strength of the prediction. 
A network as claimed in claim 18, wherein the input neuron layer consists of a 
total of 105 neurons corresponding to 105 compositional properties. 
A network as claimed in claim 18, wherein the hidden layer comprises of 
neurons represented as 30 for amino acid frequencies, 28 for multiplet 
frequencies, 28 for dipeptide frequencies, 30 for charge composition and 30 for 
hydrophobic composition. 

A network as claimed in claim 18, wherein the output layer comprises of 
neurons to deliver the output values as probability value for each protein 
sequence. 
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The Neural Network architecture 



Figure 1 
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Assessment of SPAAN using defined test dataset. 
Figure 2 
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Figure 3 (a) 
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Figure 3 (b) 
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<120> Title : 

<13 0> AppFileRef erence 

<14 0 > CurrentAppNumber 

<141> CurrentFilingDate 



Sequence 



Sequence 

10 <213> Organi sraName : Escherichia coli 0157:H7 
, <400> PreSequenceString : 

MINLSKEATV GKALTPIAIL MMLSFPVASQ AAGLVIKNGT VYNANGVPW DINKPNGSGL 6 0 

SHNIWDNLNV DKNGWFNNS ANESSTSLAG NIQGNSNLTS GSAKVILNEV TSKNPSTING 12 0 

MMFV.vjDK.-r ij^v;??:r*T vNCGGcirrv rj^rLrx^rpn iQDDKiwrrs ^"tit:.:.;: iza 

15 LDNASPTEIL SRNWVNGKV SADELNWAG NNYVNAAGQV TGSVSATGSR NGYSVDVAKL 24 0 

GGMYANKISL VSTEKGVGVR NLGVIAGGVN GVS IDS KGNL LNSNAQIQSA STINLTTNGT 300 

LDNTTGTVTS VGTISBNTNK NTIVNTRAGKF ISTMGDIYVN SGTIDNTNGK LAAAGMLAVD 3 60 

TNNATLINSG KGSSVGIEAG LVALKTGTLN NSNGQIRGGY VGLESAALNN NNGDIQTTGD 42 0 

IAIISNGNVD NNKGLIRSST GHIVIGAAGS VNNGSTKTAD TGSSDSLGII ADTGVEIGAN 48 0 

20 NINNNGGQIA SNGNVSLSSY STIDDYAGKI LSNSKVTIKG SSLRNDTGGI SGKQGIEVAV 54 0 

GGSLTNNIGV ISSEEGDISL LANSVDNHGG FMMGQNITME S MS GVNNNTA LIVASKKLKI 600 

MARGS I ENRD GNNFGNAYGL YFGMPQQTGG MVGKEGIELS GQNIYNNNSR LIAEDGPLTL 66 0 

QAQNTFDNTR ALVT S GADAS IQVGGTYYNN YATTWSAGNL D I DATTLQNS SSGTMIDNNA 72 0 

TGFIASDKNL S L EWNSLTN YGWISGKGDV DVTVNNGNLY MRNTIAAEKG LDIAALNGIE 78 0 

25 NWKD I S AGGD LTMNTNRHVT NNSNSNMVGQ NIVINAVNDI NNRGNIVSDA DLNVT TKGNL 84 0 

YNYLYMVGYG DIALSANSVA NNNATIEATG DLIIDSKGNV GNNRGNLHAL NGVLSVKGNN 900 

LNNDNGEIRG YGDVTLALTG NYDSYKGSLT SETGDVTLTA NIVDNAYGLI AGEWSVDAK 960 

STIYNNTALI AANKKLVINA GGNLENRDGN NFLRNMGALF GITDNVGGIV GKEGVTLSAQ 102 0 

NVYNNNSSII AENGPLNLLS RGTLDNTRAL LSSGADAIIR AAGTFYNNYA TTYSAGNLDV 108 0 

30 YAASLNNASD GRLEDNTATG VIASDKNLDL SVDNSVTNYG WIS GKGDVHF NTVLKGTLYNR 1140 

NAIAADNALT INALNGVENF KDIVAGTALT IDTQKYVTNN SNSNMLGQTI AINAVNDINN 12 0 0 

RGNIVGDYSL GVKTTGNIYN YLNMLSYGVA GVSANKVTNS GKDAVLGGFY GLALEANETD 1260 

NTGTIVGM 1268 
<212> Type : PRT 

35 <211> Lengtn : 1268 

SequenceKame : SEQ ID 1 
SequenceDescription : 



<213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MNTIHLRCLF RMNPLVWCLW ADVAAKLRSL KRYSVFTFQR MKFMNRTSPY YCRRSVLSLL 60 

ISALIYAPPG MAAFTPDVIG WNDETVDGS QRVDERGTTN NTHIINHGQQ NVYGGVSNGS 12 0 

45 LIESGGYQDV GRHNNYVGQS NNTTINGGRQ SIHDGGISTG TIIESGNQDV YKGGI SNGTT 180 

IKGGASRVEG GSANGTLIDG GSQIVKVQGH ADGTTINKSG SQDWQGSLA TNTTINGGRQ 240 

YVEQSTVETT T I KNGGEQRV YESRALDTTI EGGTQSLNSK STAKNTQIYS GGTQIIDNTS 3 00 

SSDVIEVYSG GVLDVSGGTA TNVTQHDGAI LKTNTNGTTV SGTNSEGAFS IHNHVADMVL 3 60 

LENGGHLDIN AYGSANKTI I KDKGTMSVLT NAKADATRID NGGVMDVAGN ATNTI INGGT 42 0 

50 QNINNYGIAT GTNINSGTQN IKSGGKADTT IISSGSRQW EKDGTAIGSN ISAGGSLIVY 480 

TGGIAHGVNQ ETGSALVANT GAGTDIEGYN KLSHFTITGG EANYWLENT GELTWAKTS 540 

AKNTTIDAGG KL I VQKEAKT DSTRLNNGGV LEVQDGGEAK HVEQQSGGAL IASTTSGTLI 60 0 

EGTNS YGDAF YIRNSEAKNV VLENAGSLTV VTGSRAVDTI I NANGKMD V Y GKDVGTVLNS 660 

AGTQTIYASA TSDKANI KGG KQTV YGLAT E ANIESGEQIV DGGSTEKTHI MGGTQTVQNY 72 0 

55 GKAINTDIVS GLQQIMANGT AEGSIINGGS QIVNEGGLAE NSVLNDGGTL DVREKGSATG 78 0 

IQQSSQGALV ATTRATRVTG TRADGVAFS I EQGAANNILL ANGGVLTVES DTSSDKTQVN 84 0 

TGGRE I VKTK ATATGTTLTG GEQIVEGVAN ETTINDGGIQ TVS ANGEA I K TTINEGGTLT 90 0 

VNDNGKATDI VQNSGAALQT STANGIEISG THQYGTFSIS GNLATNMLLE NGGNLLVLAG 960 

TEARDSTVGK GGAMQNQGQD SATKVNSGGQ YTLGRSKDEF QALARAEDLQ VAGGTAIVYA 102 0 

60 GTLADASVSG ATGSLSLMTP RDNVTPVKLE GAIRITDSAT LTIGNGVDTT LADLTAASRG 108 0 

SVWLETSNNSC AGTSNCEYRV NSLLLNDGNV YLSAQTAAPA TTNGIYNTLT TNELSGSGNF 1140 

YLHTNVAGSR GDQLWNNNA TGNFKI FVQD TGVS PQSDDA MTLVKTGGGD ASFSLGNTGG 12 0 0 

FVDLGTYEYV LKSDGNSNWN LTNDVKPNPD PNPNPMPNPK PDPKPDPKPD PKPDPTPEPT 1260 

PTPVPEKRIT PSTAAVLNMA ATLPLVFDAE LNSIRERLNI MKASPHNNNV WGATYNTRNN 132 0 

65 VTTDAGAGFE QTLTGMTVGI DSPNDIPEGI ATLGAFMGYS HSHIGFDRGG HGSVGSYSLG 138 0 

GYASWEHESG FYLDGWKLN RFESNVAGKM SSGGAANGSY HSNGLGGHIE TGMRFTDGNW 1440 

NLTPYASLTG FTADNPEYHL SNGMESKSVD TRSIYRELGA TLSYNMRLGN GMEIEPWLKA 150 0 
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AVRKEFVDDN RVKVNNDGNF VNDLSGRRGI YQAGIKASFS STLSGHLGVG YSHGAGVESP 1560 
WNAVAGVNWS F 1571 
<212> Type : PRT 
<211> Length : 1571 
5 SequenceName : SEQ ID 2 

SequenceDescription : 



Sequence 



10 <213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MAVKISGVLK DGTGKPVENC TIQLKARRNS ATVWNTVAS ENPDEAGRYS MDVEYGQYSV 60 

ILLVEGFPPS HAGTITVYED SQPGTLNDFL GAMTEDDVRP EALRRFELMV EEVARNASAV 12 0 

AQNTAA^ Y KS ^CT^AOTtXftR?*! .AATHATDAAD SA^LTiL'TSAG QAASSAQSAS SSAGTASTKA J 30 

15 TEAS KS AAAA ESSKSAAATS AGAAKTSETN AAVSQQSAAT SASTATTKAS EAASSARDAS . 240 

AS KEAAKS S E TSAASSASSA AS S ATAAGNS AKAAKTS ETN AKSSETAAEQ SASAAAGSKT 3 00 

AAALSASAAS TSAGQASASA TAAGKSAESA ASSASTATTK AGEATEQASA AASSASAAKT 3 60 

SETNAKASET SAES SKTAAA SSASSAASSA SSASASKDEA TRQASAAKSS ATTAS TKATE 42 0 

AAGSATAAAQ SKSTAESAAT RAETAAKRAE D I AS AVALED AS TTKKGI VQ LSSATMSTSE 4 80 

20 SLAATPKAVK AAYEIaANGKY TAQDATTAQK GIVQLSMATN STSEMLAATP KSVKAAYDLA 54 0 

NGKYTAQDAT TAQKGIVQLS SATNSASETL AATPKAVKAA NDNANGRVPS ARKVNGKALS 600 

SDITLTPKDI GTLNSTTMSF SGGAGWFKLA TVTMPQASSV VSITLIGGAG FNVGSPQQAG 660 

ISELVLRAGN GNPKGITGAL WQRTSTGFTN FAWVNTSGDT YDIYVAIGNY ATGVNIQWDY 72 0 

TSNASVTIHT S PAYS ANKPE GLTDGTVYSL YTPSEQFYPP GAPIPWPSDT VP S GYALMQG 78 0 

25 QTFDKSAYPK LAAAYPSGVI PDMRGWT I KG KPASGRAVLS QEQDGIKSHT HSASASSTDL 840 

GTKTTSSFDY GTKSTNNTGA HTHSVSGTAA SAGNHTHSVT GASAVSQWSQ NGSVHKWSA 900 

ASVNTSAAGA HTHSVSGTAA SAGAHAHTVG I GAHTHS VAI GSHGHTITVN AAGMAENTVK 960 

NTIAFNYIVRL A 971 
<212> Type : PRT 

30 <211> Length : 971 

SequenceNarae : SEQ ID 3 
SequenceDescription : 



Sequence 
35 __. 

<213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MKRVI TLFAV LLMGWSVMAW SFACKTANGT AIPIGGGSAN VYVNLAPAVN VGQNLWDLS 60 

TQIFCHNDYP ETITDYVTLQ RGSAYGGVLS NFS GTVKYS G SSYPFPTTSE TPRWYNSRT 12 0 

40 DKPWPVALYL TPVSSAGGVA IKAGSLIAVL I LRQTKNYNS DDFQFVWNIY ANNDVWPTG 18 0 

GCDVSARDVT VTLPDYPGSV PIPLTVYCAK SQNLGYYLSG TTADAGNS I F TNTASFSPAQ 240 

GVGVQLTRNG TIIPANNTVS LGAVGTSAVS LGL TANYART GGQVTAGNVQ SIIGVTFVYQ 3 00 



<212> Type : PRT 
45 <211> Length : 3 00 

SequenceNarae : SEQ ID 4 
SequenceDescription : 



Sequence 
50 

* 

<213> OrganismName : Escherichia 
<400> PreSequenceString : 
MGYVTGGLPM KNNRAWALIS GLILFSGTAP 
PTIAASDLMQ RGQSDRVPLV FQLKDCKSTT 
55 VGIGIETAGG AAVPINSTTG ASFPLNQGNN 
YF 

<212> Type : PRT 
<211> Length : 182 

SequenceName : SEQ ID 5 
60 SequenceDescription : 



coli 0157:H7 

AADNLHFTGN LLGKSCTPVI NGNLLAE I HF 60 
AFNVKVTLMG TEDTDLPGFL SIDSSSSATG 12 0 

SVNFNAWLQT VNGRNVTSGD FTATMTVTFE 18 0 

182 



Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
65 <4 00> PreSequenceString : 

MKRHLNTSYR LVWNHITGTL WASELARSR GKRAGVAVAL SLAAVTSVPA LAADKWQAG 60 
ETVNDGTLTN HDNQIVFGTA NGMTISTGLE LGPDSEENTG GQWIQNGGIA GNTTVTTNGR 120 
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QWLEGGTAS DTVIRDGGGQ SLNGLAVNTT LNNRGEQWVH EGGVATGT 1 1 NRDGYQSVKS 180 

GGLATGTIIN TGAEGGPDSD NSYTGQKVQG TAESTTINKN GRQIILFSGL ARDTLIYAGG 240 

DQSVHGRALN TTLNGGYQYV HRDGLALNTV INEGGWQWK AGGAAGNTTI NQNGE LRVHA 3 00 

GGEATAVTQN TGGALVTSTA ATVIGTNRLG NFTVENGKAD GWLESGGRL DVLESHSAQN 3 60 

5 TLVDDGGTLA VSAGGKATSV TITSGGALIA DSGATVEGTN ASGKFS IDGT SGQASGLLLE 42 0 

NGGSFTVNAG GQAGNTTVGH RGTLTLAAGG SLSGRTQLSK GAS MVLNGDV VSTGDIVNAG 480 

EIRFDNQTTP NAALSRAVAK SNSPVTFHKL TTTNLTGQGG TINMRVRLDG SNASDQLVIN 540 

GGQATGKTWL AFTNVGNSNL GVATTGQGIR WDAQNGATT EEGAFALSRP LQAGAFNYTL 600 

NRDSDEDWYL RSENAYRAEV PLYTSMLTQA MDYDRI LAGS RSHQTGVNGE NNSVRLSIQG 660 

10 GHLGHDNNGG IARGATPESS GSYGFVRLEG DLLRTEVAGM SLTTGVYGAA GHSSVDVKDD 72 0 

DGSRAGTVRD DAGSLGGYLN LVHTSSGLWA DIVAQGTRHS MKASSDNNDF RARGWGWLGS 780 

LETGLPFSIT DNLMLEPQLQ YTWQGLSLDD GQDNAGYVKF GHGSAQHVRA GFRLGSHNDM 840 

TFGEGTSSRD TLRDSAKHSV SELPVNWWVQ PSVIRTFSSR GDMSMGTAAA GSNMTFSPSR 9 00 

NGTSLDLQAG LEARISENIT LGVQAGYAHS VCGCSAEGYN GQATI.NMTF -v 249 

15 <212> Type : PRT 

<211> Length : 949 

SequenceMame : SEQ ID 6 
SequenceDescription : 

20 Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<40 0> PreSequenceString : 

MKKWHYIFCI ILFHLGLPCG YAANDGTCAT RGGTHTLSLN FPLTTVSAAN NVPGNTLIDI 60 
25 ANATSSENYS VLCNCDSKHS NGAYHEIYYT ADPAPGMVYS TTASGLAFYY LNEYVDVGTK 12 0 

I S VLNAGYTA VPFEHVSNQA TTTDHTCQGN KTTAVGVSLK TGADAKISFR IKRSINGTW 180 
IPITDIALLY ANISSTTTRG EAIAKVRISG SLTAPQSCQI NAGQVIYFDF DTIPASEFSS 240 
TAGQAITSRK ITKTVSIECT GMGYERTQKV DAS FTGTNRS SDDTMVATDN ADVGIKIYNK 3 00 

SNAEVSVNNG KLPADMGNTT I FGRKNGS VT FSAAPASFTG ARPQPGVFNA TATLTIEFVN 3 60 

30 

<212> Type : PRT 
<211> Length : 360 

SequenceName : SEQ ID 7 

SequenceDescription : 

35 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

40 MSRYKTGHKQ PRFRYSVLAR CVAWANISVQ VLFPLAVTFT PVMAARAQHA VQPRLSMGNT 60 

TVTADNNVEK NVAS FAANAG TFLSSQPDSD ATRNFI TGMA TAKANQEIQE WLGKYGTARV 12 0 

KLNVDKDFSL KDSSLEMLYP IYDTPTNMLF TQGAIHRTDD RTQSNIGFGW RHFSGNDWMA 180 

GVNTFIDHDL SRSHTRIGVG AEYWRDYLKL SANGYIRASG WKKSPDIEDY QERPANGWDI 240 

RAEGYLPAWP QLGASLMYEQ YYGDEVGLFG KDKRQKDPHA ISAEVTYTPV PLLTLSAGHK 3 00 

45 QGKSGENDTR FGLEWYRIG EPLAKQLDTD SIRERRVLAG SRYDLVERNN NIVLEYRKSE 3 60 

VIRIALPERI EGKGGQTLSL GLWSKATHG LKNVQWEAPS LLAEGGKITG QGSQWQVTLP 420 

AYRPGKDNYY AISAVAYDNK GNTS KRVQTE WITGAGMSA DRTALTLDGQ SRIQMLANGN 480 

EQKPLVLSLR DAEGQPVTGM KDQIKTELTF KPAGNIVTRS LKATKSQAKP TLGEFTETEA 540 

GVYQSVFTTG TQSGEATITV SVDGMSKTVT AELRATMMDV ANSTLSANEP SGDWADGQQ 600 

50 AYTL TLTAVD SEGNPVTGEA SRLRFVPQDT NGVTVGAISE IKPGVYSAAV SSTRAGNWV 660 

RAFSEQYQLG TLQQTLKFVA GPLDAAHSS I TLNPDKPWG GTVTAIWTVK DAYDNPVTSL 720 

TPEAPSLAGA AAEGSTASGW TNNGDGTWTA QITLGSTAGE LEVMPKLNGQ NAAANAAKVT 78 0 

WADALSSNQ SKVSVAEDHV KAGESTTVTL VAKDAHGNA I SGLALSASLT GTASEGATVS 840 

SWTEKGNGSY VATL TTGGKT GELRVMPLFN GQPAATEAAQ LTVIAGEMSS ANSTLVADNK 900 

55 APTVKTTTEL TFTVKDAYGN PVTGLKPDAP VFSGAASTGS ERPSAGNWTE KGNGVYVSTL 960 

TLGSAAGQLS VMPRVNGQNA VAQPLVLNVA GDASKAEIRD MTVKVNNQLA NGQSANQITL 1020 

TWDTYGNPL QGQEVTLTLP QGVTS KTGNT VTTNAAGKAD I ELMSTVAGE HNISASVNGA 1080 

QKTVTVKFNA DASTGQANLQ VDAAAQKVAN GKDAFTLTAN VEDKNGNPVP GSLVTFNLPR . 1140 

GVKPLTGDNV WVKANDEGKA ELQWSVTAG TYEITASAGN SQPSNTQTIT FVADKATATV 12 0 0 

60 SGIEVIGNYA LADGNAKQTY KVTVTDANNN LLKDSEVTLT ASPANLVLTP NGTAKTNEQG 12 60 

QAIFTATTTV AAKYTLTAKV SQADGQESTK TAESKFVADD TNAVLTASSD VTSLVADGIS 132 0 

TAKLEVTLMS ANNPVGGNMW VDIKTPEGVT EKDYQFLPSK NDHFVSGKIT RTFSTSKPGV 13 80 

YTFTFNALTY GGYEMKPVTV T I TAVDADTA KGEEAMN 1417 
<212> Type : PRT 

65 <211> Length : 1417 

SequenceName : SEQ ID 8 
SequenceDescription : 
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Sequence 



<213> Organi smName : Escherichia coli 0157 :H7 
5 <400> PreSequenceString : 

MARGWASSEA S GAMTDWLNN FGTARISLGV DEDFSLKNSQ FDFLHPWYDT PDYLLFSQHT 
LHRTDDRTQI NTGLGWRHFT SSWMSGINLF FDHDLSRYHS RAGLGAEYWR DYLKLSSNAY 
IGLTGWRSAP ELDNDFEARP ANGWDLRAEG WLPAWPQLGG KLVYEQYYGD EVALFDKNDR 
QSNPHAITAG LNYTPFPLLT LSAEQRQGKQ GENDTRFAVD LTWQPSSSMQ KQLNPDEVAG 

10 RR S LAGS R YD LIDRNNNIVL EYRKKELIRL SLLDPVKGKS GEIKPLVSSL QTKYALKGYN 
IEAAALEAAG GKVSTSGKDI TVTLPGYRFT NTPETDNTWS IDVTAEDVKG NIjSRHEQSMV 
VIQAPTLSQK DSLLSVNPLT VAADKKSTTT LTVTAHDSDG TPVPGLALQT RSEGVQDITL 
SDWTDNGDGS YTQILTAGTT SGSVTLTPQI NGESAVKESI WNIVPWSS RDHSSITIDN 
•VSYYAGDDIK V^VELKDDSN QPVAYQKEEL VKAVTVEIJSK PGATI VWHEE QPGVYAANYP 

15 AYKQGTALiRA QLSLHNWNAP LQSHIYNIEA NQNKARVATL SATNNDVYAD KKTFNTLTIN 
VTDESDNPLT NHQVTFKNEK GSAEFVEPPQ QNTDAYGVAT INMVSQVAEE NTISATLPNG 
FSQRIIAKFV SDSSTPKFKQ LVADPDTI I A GNSQGSTLTA IITDFHNNPL KDMKVNFVAP 
GGSQLDNTTA TTDQSGIVRV HLTSSKAGSY SVDASLEVDK NIHQSVTITV VPNREQSVMT 
LNAGSGSAIA NNTNIVTLTA SVKDVYGHPL PDEDVKFTLP ASMTGNFTLS SETARTDANG 

20 DAWTLRGTK AGE F TVTATL TRNNTVAYQQ VTFIGDTNSA QLQPLTASLN SIVAGNSTGS 
TLTATI LDAY QNPLKDQLVT FQSNDVTLSE TEVTTNTLGQ ATVTMTSNIA GQHNVWSRK 
AQASDNKTFS LSVLPDESSA KVISITGAEK TITVGENITL RILVQDAFNN VIAGQRVRLS 
AQPTTNITIG DTAYTDNNGY AYVNLLSTQP GVYQVTATLD NNSSSKVDVN VANGKLELTS 
SKPETTVHNS EGITLTATAR NARGELMPGQ IITFSVTPEG ATLSNTGEVL TDQSGQAKVT 

25 LTSDKVNVYT VTAI MGKDVP VQSQVTVAVK ADAKTAHWS WASPDTITA DGIDSSTITS 
RVEDDYGFPV EGVDISHGLD TKGSPWNIP TTRTDQSGQV TAT ITS TLAE TLTVNVQVPG 
TANQ SAT I TL VAGTADESKS ILKSDVDTLK ADYQQSAKLT LTLQDKYGNP IVTSDHLEFV 
QSGPFVNFLK LSDIDYSQRN YGEYTVTVTG GKEGTATLIP MLNGVHQANL SISLNLIQSI 
KEMSGHVTAN NHTFSTAKFP SEGFAGAYYT LNNDNFEAGK TVDDYMFSSS QGWVSVDASG 

30 KVSFANIGDQ TSVTISAVPR QGGTTYQTLI KLKGWWVNNG NHTNIWLAAN ALCHAKNDGY 
NLPGITHLTS GENKRTQGSL YGEWGNVGAF SSNSQFTPGA YWTSESDDYS RHYYVQMLTG 
MTGSDADSSP QLTACRKSL 
<212> Type : PRT 
<211> Length : 1579 

35 SequenceNarae : SEQ ID 9 

SequenceDescription : 

Sequence 



40 <213> OrganisraName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MITHGCYTRT RHKHKLKKTL IMLSAGLGLF F YVNQNS FAN GENYFKLGSD SKLLTHDSYQ 60 

NRLFYTLKTG ETVADLSKSQ DINLSTIWSL NKHLYSSESE MMKAAPGQQI ILPLKKLPFE 12 0 

YSALPLLGSA PLVAAGGVAG HTNKLTKMSP DVTKSNMTDD KALNYAAQQA ASLGSQLQSR 18 0 

45 S LNGD YAKDT ALGIAGNQAS SQLQAWLQHY GTAEVNLQSG NNFDGSSLDF LLPFYDSEKM 240 

LAFGQVGARY IDSRFTANLG AGQRFFLPAN MLGYNVFIDQ DFSGDNTRLG I GGEYWRD YF 3 00 

KSSVNGYFRM SGWHESYNKK DYDERPANGF DIRFNGYLPS YPALGAKL I Y EQYYGDNVAL 360 

FNSDKLQSNP GAATVGVNYT PIPLVTMGID YRHGTGNEND LLYSMQFRYQ FDKSWSQQIE 42 0 

PQYVNELRTL SGSRYDLVQR NNNIILEYKK QDILSLNIPH DINGTEHSTQ KIQLIVKSKY 480 

50 GLDRIVWDDS ALRSQGGQIQ HSGSQSAQDY QAILPAYVQG GSNIYKVTAR AYDRNGNSSN 540 

NVQLTITVLS NGQWDQVGV TDFTADKTSA KADNADT I T Y TATVKKNGVA QANVPVSFNI 600 

VSGTATLGAN SAKTDANGKA TVTLKSSTPG QVWSAKTAE MTSALNASAV IFFDQTKASI 660 

TE I KADKTTA VANGKDAIKY TVKVMKNGQP VNNQSVTFST NFGMFNGKSQ TQATTGNDGR 720 

ATITLTSSSA GKATVSATVS DGAEVKATEV TFFDELKIDN KVDIIGNNVR GELPNIWLQY 780 

55 GQFKLKASGG DGTYSWYSEN TSIATVDASG KVTLNGKGSV VIKATSGDKQ TVSYTIKAPS 840 

YMIKVDKQAY YADAMS I CKN LLPSTQTVLS DIYDSWGAAN KYSHYSSMNS ITAWIKQTSS 9 00 

EQRSGVSSTY NLITQNPLPG VNVNTPNVYA VCVE 934 
<212> Type : PRT 
<211> Length : 934 

60 SequenceName : SEQ ID 10 

SequenceDescription : 

Sequence 



65 <213> Organi smName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 
MLVLSESFKN KLLPMNGYMK GGSDSGSKAQ ARATEKGIEL QREMWQTNMQ NLAPFTPLAQ 60 



60 
120 
180 
240 
300 
360 
420 
480 

r— a *n 

600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1579 
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QYVSQLQNLS SLQGQGQALN QYYNSQQYKD LAGQARYQSL AAAEATGGLG STATGNQLAA 12 0 

IAPTLGQNWL SGQMNNYNNL ANIGLGALTG QANAGQNYAN NVSQLYQQQA AASAANANKP 180 
SGLQSFATGA I GGAAS GAM I GSAVPVIGTG IGALAGGVIG GLGSLF 226 
<212> Type : PRT 
5 <211> Length : 226 

SequenceMame : SEQ ID 11 

SequenceDescription : 

Sequence 
10 

<213> OrganismName : Escherichia coli 0157:H7 
<4 00> PreSequenceString : 

MKKILSGLIL LLCCPYGFAA NGDGATHMSN LSFGPLTVAA ANNHSGYNIF EALSNTTGTY 60 
E>VRGHCDDTH GGPGQQTAFF P I F Y TGDAA P GLVLERTLMG LOTYALNDYL SVGVTIFIIN • 1-2 0 

15 2STQYAAI PFEH LSNQSTSPQH TCGAGNNGST VNLDSGRSAK IiSFYVRHSIT GTVTI PTTEV 18 0 

AWLYAGMSDH FPKTTPVSKV TIRGQLTAPQ NCELTPHQSI DVDFQKINSA EFSSTAGSII 240 

AERKIKTEVT VSCTGMEDVR STEWSASMI AANRSADATM IVTSNPDVGI KIFDKNDRPV 3 00 

NVDGGNXj PAD MGAISRLGKT DGSVTFYSAP ASLTGAKPAP DNGFTATATL VIEFTN 3 56 

20 <212> Type : PRT 

<211> Length : 356 

SequenceName : SEQ ID 12 
SequenceDescription : 

25 Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<:4 00> PreSequenceString : 

MNKIYRLKWN RSRNCWSVCS ELGSRVKGKK SRAVLISAIS LYSSLVFADD VIVNQDKTID 60 

30 FGKENQSIDY RITVTDNANL VINATDTSRP RLTLASGGGL DITGGKVTIN GPLNF LLKGT 12 0 

GFLNVSNAGS ELYADDLYES NSGMRHDRGY FNVSNGGKIH VKGTSRLTYL QGNVSGEGSQ 180 

VTsTSETFFMGV YGSYGGNQYL SVHMGGEVNA RKQISLGYYD QVSDTTLAVS EGGKI S APT I 240 

SLSTNSELAL GAQEGSAAKA AGIIDAEKIE FWAKTSEKK ITLNHTDKDA TISADIVSGS 3 00 

KGLGYINALHT GTTYLTGDNS AFSGKVKIEQ NGALGITQNT GTAEINNRGK LHLKADDSMT 3 60 

35 FANK I S GNGT ISIDSGTVEL TGNNYAFSGY IDVASGAVAV ISEDKNIGRA ELDVDGKLQI 42 0 

MANKDWFDN DLEGRGI VE I NMGNHEFSFD EFAYTDWFQG SLAFQNTTFN LEKNAEFLQK 48 0 

GGITAGQGSL VTVGKGAHS I STLGFSGGTV DFGALTAGAQ MTEGTVNVSK TLDLRGEGVI 540 

QVSDSDWRS VSRDIDSALS LTEVDDGNST I KLVDAQGAE VLGDAGNLQL QDKNGQILSS 600 

SAQRDIQQMG QKAAVGTYDY RLTSGVNNDG LYIGYGLTQL DLHATD SDAL VLS SNGKS EN* 660 

40 AADL S AKITG SGDLAFSSQK GQTVSLSNKD NDYTGVTDLR SGTLLLNNDN VLGNTHELRL 720 

AAETELDMNG HSQTVGTLNG SADSLLSLNG GSLTVTNGGT STGSLTGSGE LNIQGGTLDI 780 

AGDNSNLTAW VNIANSANVL VSHAQGLGSA NVENNGTLAL NNSAEKRAAA SVNYALGGNL 840 

TNNGTLMTGM SGQQAGNVLV VKGNYHGNNG QLVMNTVLNG DDSVTDKLW EGDTS GTTAV 9 00 

TVNNAGGTGA KTLNGIELIH VDGKSEGEFV QAGRIVAGAY DYTLARGQGA NSGNWYLTSG 960 

45 SDSPELQPEP DPMPNPEPNP NPEPNPNPTP TPGPDLNVDN DLRPEAGSYI ANLAAAMTMF 102 0 

XTRLHERLGN TYYTDMVTGE QKQTTMWMRH EGGHNKWRDG SGQLKTQSNR YVLQLGGDVA 108 0 

QWSQNGSDRW HVGVMAGYGN SDSKTISSRT GYRAKASVNG YSTGLYATWY ADDESRNGAY 1140 

LDSWAQYSWF DMTVKGDDLQ SESYKSKGFT ASLEAGYKHK LAEFMGSQGT RNEWYVQPQA 12 0 0 

QVTWMGVKAD KHRESNGTLV HSNGDGNVQT RLGVKTWLKS HHKMDDGKSR EFQPFVEVNW 1260 

50 L.HNSKDFSTS MDGVSVTQDG ARNIAEIKTG VEGQLNANLN VWGMVGVQVA DRGYNDTSAM 13 2 0 

VGIKWQF 13 27 
<212> Type : PRT 
<211> Length : 1327 

SequenceMame : SEQ ID 13 



55 SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
60 <400> PreSequenceString : 

M I TMKKS VLT AFITWCATS SVMAADDNAI TDGSVTFNGK VIAPACTLVA ATKDSWTLP 60 
DVSATKLQTN GQVSGVQTDV PIELKDCDTT VTKNATFTFN GTADTTQITA FANQASSDAA 120 
T3STVAL QM YMM DGTTAI KPDT ETGNILLQDG DQTLTFKVDY IATGKATSGN VNAVTNFHIN 180 

y:y 182 
65 <212> Type : PRT 

<211> Length : 182 

SequenceName : SEQ ID 14 
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SequenceDescription : 
Sequence 

<213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MSKFVKTAIA ATMVMGAFAS TSTIAAGNNG TARFYGTIED SPCSIVPDDH KLEVDMGDIG 60 
SGILKNNGTS TPKAFQIHLQ DCVFDTQTTM TTTFTGNASS TNSGNYYTIY NTDTGAAFNN 12 0 

VS IiAI GDAQG TSYKSGAGIE QKIVNDTATN KGKAKQTIiDF KAWLVGAADA PDLGNFEANT 180 
TFQITYL 187 
<212> Type : PRT 
<211> Length : 187 

SequenceName : SEQ ID 15 

SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

MRVIFLRKEY LSLLPSMIAS LFSANGVAAA IDLCQGYDIK ASCHASRQSL SGITQVWSIA 60 

DGQWLVFSDM TNNAS GGAVF LQQGAEFTLS PENETGMTLF ANN TVS GE YN NGGAIFAKEN 120 

STLNLTDVIF SGNVAGGYGG AIYSSGTNDT GAI DLRVTNA VFRNNIANDG KGGAIYTINN 180 

DIYLSDDVFN NNQAYTSTSY SDGDGGAIDV TDNNSDSKHP SGYTIINNTA FTNNTAEGYG 240 

GAIYTNSATA PYLIDISVDD SYS QNGGVLV DENNSAAGYG DGPSSAAGGF MYLGLSEVTF 3 00 

D IAD GKTLVI GNTENDGAVD SIAGTGLITK TGS GDIiVLNA DNNDFTGEMQ I ENGE VTLGR 3 60 

SNSLMNVGDT HCQDDPQDCY GLTIGSIDKY QNQAELNVGS TQQTFAHSLT GFQNGTLNID 420 

AGGNVTVNQG SFAGTIEGAG QLTIAQNGSY VLAGAQSMAL TGDIWDAGA VLSLEGDAAD 480 

LAALQDDPQS IVLNGGMLDL SDFSTWQSGT SYKDGLEVSG SSGTVIGSQD WDLAGGNDM 540 

HIGGDGKDGV YWIDAGDGQ VSLANDNQYL GTTQIASGTL MVS DNS QLGY THYNRQVI FT 60 0 

DKPQESVMEI TANVDTRSTT TEHGRDIEMR ADGEVAVDAG VDTQWGALMA DSSGQHQDEG 660 

STLTKTGAGT LELTASGTTQ SAVRVEEGTL QGDVADIFPY ASSLWVGDGA TFVTGADQDI 720 

QSIDATSSGT IDISDGTVLR LTGQDTSVAL NAS ItFNCD GT LVNATDGVTL TGELNTNLET 780 

DSLTYLSNVT VNGNLTNTSG AVSLQNGVAG DTLTVNGDYT GGGTLLLDSE LNGDDSVSDQ 840 

LVMNGNTAGN TTVWNSITG IGEPTSTGIK WDFAADPTQ FQNNAQFSLA GSGYVNMGAY 90 0 

DYTLVEDNND WYLRSQEVTP PSPPDPDPTP DPDPTQDPDP TPDPEPTPAY QPVLNAKVGG 960 

YLNNLRAANQ AFMMERRDHA GGDGQTLNLR VIGGDYHYTA AGQLAQHEDT STVQLSGDIiF 1020 

SGRWGTDGEW MLGIVGGYSD NQGDSRSSMT GTRADNQNHG YAVGLTSSWF QHGKQKQGAW 10 80 

LDNWLQYAWF SNDVSEHEDG VDHYHSSGII ASLEAGYQWL PGRGWIEPQ AQVIYQGVQQ 1140 

DDFTAANRAR VSQSQGDDIQ TRLGLHSEWR TAVHVIPTLD LNYYHDPHST EIEEDASTIS 12 0 0 

DDAVKQRGEI KVGVTGNISQ RVSLRGSVAW QKGSDDFAQT AGFL SMTVKW 12 5 0 
<212> Type : PRT 
<211> Length : 1250 

SequenceName : SEQ ID 16 

SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MHSWKKKLIV SQLALACTLA I TS QANAATN DISGQTYNTF HHYNDATYAD DVYYDGYVGW 60 

NNYAADSYYN GDIYPVINNA TVNGVISTYY LDDGISTNTN ANSLTIKNST IHGMITSECM 120 

TTDCADDRAT GYVYDRLTLS VDNSTIDDNY EHYTYNGTYN NAADTHWDV YDMGTAI TLD 180 

QEVDLSITNN S HVAGI TLTQ GYEWEDIDDN TVSTGVNSSE VFNNTITVKD STVTSGSWTD 240 

EGTTGWF GHT GNASNYSNTL TADDVAIAAI ANPYADNAMQ TTVTLDNSTL MGDWFSSNF 3 00 

DENFFPQGAN SYRDADGDVD TNGWDGTDRM DVTLNNGSKW VGAAMSVHMV DEDGDGSYDG 3 60 

YAVGTEATAT LLDIAANSLW PSSTVGVDNI NTQYDENGHI VGNEVYQSGL FNVTLNGGSE 420 

WDTTKSSLID TLSINSGSQV NVADSRLISD TVSLTGGSNL NIGEDGHVAT NTLTIDNSTV 480 

KMSDDVSAGW GLEDAALYAN TITVTNDGLL DINVDQFDAN PFQADTLNLT STTDTNGNIH 540 

AGVFDIHSSD YVMDTDLVND RTNDTTKSNY GYGLIAMNSD GHLTINGNGD NDNTAS I EAG 6 00 

QNEVDNNGDH VAAATGNYKV R I DNATGAGS I AD YNGNEL I YVNDKNSNAT FSAANKADLG 660 

AYTYQAEQRG NTWLQQMEL TDYANMALSI PSANTNIWNL EQDTVGTRLT NSRHGLADNG 720 

GAWVS YFGGN FNGDNGT INY DQDVNGIMVG VDTKIDGNNA KW I VGAAAGF AKGDMNDRSG 7 80 

QVDQDSQTAY I YS SAHF ANN VFVDGSLSYS HFNNDL S ATM SNGTYVDGST NSDAWGFGLK 840 

AGYDFKLGDA GYVTPYGSIS GLFQSGDDYQ LSNDMKVDGQ SYDSMRYELG VDAGYTFTYS 9 00 

EDQALTPYFK LAYVYDDSNN DNDVNGDS ID NGTEGSAVRV GLGTQFSFTK NFSAYTDANY 9 60 

LGGGDVDQDW SANVGVKYTW 9 80 
<212> Type : PRT 
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<211> Length : 980 

SequenceName : SEQ ID 17 
SequenceDe script ion : 

5 Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

MKLKHVGMIV VSVLAMSSAA VSAAEGDESV TTTVNGGVIH FKGEWNAAC AIDSESMNQT SO 
10 VELGQVRSSR LAKAGDLSSA VGFNIKLNDC DTNVSSNAAV AFLGTTVTSN DDTLALQSSA 120 

AGSAQINVGIQ ILDRTGEVLI LDGATFSAKT DLIDGTNILP FQARYIALGQ SVAGTANADA 18 0 

TFKVQYL 187 

<212> Type : PRT 

<211> Length i 187 
15 SequenceName : SEQ ID 18 

SequenceDescription : 

Sequence 

20 <213> Organ ismName : Escherichia coli Q157:H7 
<400> PreSequenceString : 

MKLLKVAAIA A I VF S GS ALA GWPQYGGGG GNHGGGGNNS GPNSELNIYQ YGGGNSALAL 60 
QADAR2STS DLT ITQHGGGNGA DVGQGSDDSS IDLTQRGFGN SATLDQWNGK DSHMTVKQFG 12 0 

GGNGAAVDQT ASNS TVNVTQ VGFGNNATAH QY . 152 

25 <212> Type : PRT 

<211> Length : 152 

SequenceName : SEQ ID 19 

SequenceDescription : 

30 Sequence 



<213> OrganismName : Escherichia coli G157:H7 
<40 0> PreSequenceString : 

MP I GNXjGHNP NVNNSIPPAP PLPSQTDGAG GRGQLINSTG PLGSRALFTP VRNSMADSGD 60 
35 NRASDVPGLP VNPMRLAASE ITLNDGFEVL HDHGPLDTLN RQIGSSVFRV ETQEDGKHIA 120 
VGQRNGVETS WLSDQEYAR LQSIDPEGKD KFVFTGGRGG AGHAMVTVAS DITEARQRIL 18 0 

ELLEPKGTGE SKGAGESKGV GELRESNSGA ENTTETQTST STSSLRSDPK LWLAL GTVAT 240 
GIiIGLAATGI VQALALTPEP DSPTTTDPDA AASATETATR DQLTKEAFQN PDNQKVNIDE 3 00 

LGNAI PSGVL KDDWANIEE QAKAAGEEAK QQAIENNAQA QKKYDEQQAK RQEELKVSSG 3 60 

40 AGYGLSGALI LGGGIGVAVT AALHRKNQPV EQTTTTTTTT TTTSARTVEN KPANNTPAQG 420 
NVDTPGSEDT MESRRSSMAS TSSTFFDTSS I GTVQNPYAD VKTSLHDSQV PTSNSNTSVQ 480 
NMGNTDSWY STIQHPPRDT TDNGARLLGN PSAGIQSTYA RLALSGGLRH DMGGLTGGSN 540 
SAVNTSNNPP APGSHRFV 5 58 

<212> Type : PRT 
45 <211> Length : 558 

SequenceName : SEQ ID 20 
SequenceDescription : 

Sequence 
50 

<213> OrganismName : Escherichia 
<400> PreSequenceString : 
MFSTFKKAAL LAAIALPFST MAAPTVTFQG 
ATLAD GQ SAG QTPFTVSVSN CQAPTGADQA 
55 GIQLMDSSTS GNPVTLAGAT NVPGLTLKVG 
YTLSYL 

<212> Type : PRT 
<211> Length : 186 

SequenceName : SEQ ID 21 
60 SequenceDescription : 

sequence 

<213> OrganismName : Escherichia coli 0157:H7 
65 <400> PreSequenceString : 

MNSEGGKPGN VLTVNGNYTG NNGLMTFNAT LGGDNS PTDK MNVKGDTQGN TRVRVDNIGG 60 
VGAQTWGIE L I EVGGNS AG NFALTTGTVE AGAYVYTLAK GKGNDEKNWY LTSKWDGVTP 120 



coli 0157:H7 

EVTDQTCSVN INGQTNSWL MPTVAMADFG 60 
I NT TFL GYDV DAS TGVMGNR DTSSDAAKGF 120 
DTEASYDFGA RYFVIDSAAA TAGKI TAVAE 180 

186 
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ADTPDPINNP PWDPEGPSV YRPEAGSYIS NIAAANSLFS HRLHDRLGEP QYTDSLHSQD 18 0 

SASSMWMRHV GGHERSSAGD GQLNTQANRY VLQLGGDLAQ C ' WSSNAQDRWH LGVMAGYANQ 240 

HSNTQSNRVG YKSDGRISGY SAGLYATWYQ NDANKTGAYV DSWALYNWFD NSVSSDNRSA 3 00 

DDYDSRGVTA SVEGGYTFEA GTCSGSEGTL NTWYVQPQAQ ITWMGVKDSD HARKDGTRIE 3 60 

5 TEGDGNVQTR LGVKTYLNSH HQRDDGKQRE FQPYIEANWI NNS KV YAV KM NGQTVSRDGA 42 0 

RNLGEVRTGV EAKVNNNLSL WGNVGVQLGD KGYSDTQGML GVKYSW 466 
<212> Type : PRT 
<211> Length : 466 

SequenceName : SEQ ID 22 
10 SequenceDescription : 



Sequence 



•-213 > OrganismName : Escherichia coli 0157 ;H7 ^ . 

15 <400> PreSequenceString : 

MSYLMLRLYQ RNTQCLHIRK HRLAGF FVRIi FVACAFAVQA PLSSAELYFN PRFLADDPQA 60 

VADLSRFENG QELPPGTYRV DIYLNNGYMA TRDVTFNTGD SEQGIVPCLT RAQLASMGLN 12 0 

TASVAGMNLL ADDACVPLTT MVQDATAHLD VGQQRLNLTI PQAFMSNRAR GYIPPELWDP 18 0 

GINAGMiNYN FSGNSVQNRI GGNSHYAYLN LQSGLNIGAW RLRDNTTOSY NSSDRSSGSK 240 

20 NKWQHINTWL ERDIIPLRSR LTLGDGYTQG DIFDGINFRG AQLASDDNML PDSQRGFAPV 3 00 

IHGIARGTAQ VTIKQNGYDI YNSTVPPGPF TINDIYAAGN SGDLQVTIKE ADGSTQIFTV 3 60 

PYSSVPLLQR EGHTRYSITA GEYRSGNAQQ EKPRFFQSTL LHGLPAGWTI YGGTQLADRY 420 

RAFNFG I GKIST M GAL GAL S VD MTQANSTLPD DSQHDGQSVR FLYNKSLNES GTNIQLVGYR 48 0 

YSTSGYFNFA DTTYSRMNGY NIETQDGVIQ VKPKFTDYYN LAYNKRGKLQ LTVTQQLGRS 54 0 

25 STLYLS GSHQ TYWGTSNVDE QFQAGLNTAF EDINWTLSYS LTKNAWQKGR DQMLARNVNI 60 0 

PFSHWLRSDS KSQWRHASAS YSMSHDLNGR MTNLAGVYGT LLEDNNLSYS VQTGYAGGGD 660 

GNSGSTGYAT LNYRGGYGNA NIGYSHSDDI KQLYYGVSGG VLAHANGVTL GQPLNDTWL 72 0 

VKAPGAKDAK VENQTGVRTD WRGYAVLPYA TEYRENRVAL DTNTLADNVD LDNAVANWP 780 

TRGAIVRAEF KARVGIKLLM TLTHNNKPLP FGAMVTSESS QSSGIVADNG QVYLSGMPLA 840 

30 GKV Q VKWGE E EMAH CVANYQ LPPESQQQLL TQLSAECR 878 



<212> Type : PRT 

<211> Length : 878 

SequenceName : SEQ ID 23 
SequenceDescription : 

35 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

40 MQIIFGEKCV SLLRLFFAAV LMLWCAQTAA YSGQCHTTQG NPYIGVNFGV KTLEEEENTT 60 

GWKDKFYQW NESNDYYVSC DCDKDNVRSG RWAFAADSPL VYLGDNWYKI NDYLAAKVLL 12 0 

QVKGSSPTAV PFENVGTGAD TRWHI CDPGG QRLGGQGASG NSGSFSLKIL QPFVGSWIP 180 

PMALARLFEC YNIPAGDSCT TTGTPVLVYY LSGTINSLGS CSVNAGETIE VDLGDVFAAN 240 

FRWGHKPLG ARTAELAI P V RCNTGNAGLV NVNLSLTATT DPSYPQAIKT SRPGVGVWT 3 00 

45 DSQNNXISPA GGTLPLSIPD DADSIA 326 



<212> Type : PRT 

<211> Length : 326 

SequenceName : SEQ ID 24 
SequenceDescription : 

50 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

55 MKIKTLAIW LSALSLSSTA ALAAATTVNG GTVHFKGEW NAACAVDAGS VDQTVQLGQV 60 
RTASLAQDGA TSSAVGFNIQ LNDCDTNVAS KAAVAF LGTV I DAGHTNVLA LQSSAAGSAT 120 
NVGVQILDRT GAALTLD GAT FSEQTTLNNG TNTIPFQARY YAIGEATPGA ANADATF KVQ 180 
YQ 182 
<212> Type : PRT 

60 <211> Length : 182 

SequenceName : SEQ ID 25 
SequenceDescription : 



Sequence 

65 

<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 
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MKLKVIATLI ATVAVGVS FN SNFASASTTS 
FGNVYISELG AKSKVQQFKI RFSNCSGLPQ 
ASAATRTAVE VWTTDTPESN GSTQFHCAQK 
LVTDVRPGNF RSPTTFTITY Q 
5 <212> Type : PRT 

<211> Length : 201 

SequenceName : SEQ ID 26 
SequenceDe script ion : 

10 Sequence 



<213> OrganismName : Escherichia 

<4 0 0> PreSequenc eSt ring : 

J&tL S TVE YGET VDGWLEKDI QLVYGTANNT 

15 NGTAEYSVLN DGYQ I VQMGG AANQTTLNNG 
IEKGGLLEVK EGGLAIAVDQ KAGGAIKAST 
S LRVE end fa YNTTVDSGGL LEVMDGGTAT 
KDGVSKNYEL DDGSGLIVME DTQAIDTILD 
GSITYSSKAI SENMVINNGR ANWAGTMVN 

20 SEGASLRTHG AVDTS KADVS LENSAWTIIA 
TRSSVTASAE NFTTLTTNTL SGNGNFYMRT 
PAAGDSLTLV TTGGGDAAFT LGNAGGWDI 
VliNMAAAQPL VFDAELDTVR ERLGSVKGVS 
LTLGIDSRFS REESSTIRGL FFGYSHSDIG 

25 WKVDRFANT IHGKMSNGAT AFGDYNSNGA 
QDYTLSNGMR ADVGNTR I LR AEAGTAVSYH 
DDGKFNNDVA GTRGVYQ AG I RSSFTPTLSG 

<212> Type : PRT 
30 <211> Length : 836 

SequenceName : SEQ ID 27 
SequenceDescription : 

Sequence 

35 

<213> OrganismName : Escherichia 
<400> PreSequenceString : 
MQRKGNKLLI QLCSVILLFF TTSWYALANE 
TEIAEATWDV NIQLRGDAIG CKSLGDSKAV 

40 VYSVELLCLS CGAADELDLW LPAQSGADNF 
PKNGVSSGTT IAGKIASWYI GTNDQPWINF 
NQVTLGNSYV SEVKNGLTRE IPFSIRAEYC 
VAVKVNS T YD NSKVLLKADG SNTVDYNFAA 
NATFSFTYE 

45 <212> Type : PRT 

<211> Length : 3 69 

SequenceName : SEQ ID 28 
SequenceDescription : 

50 Sequence 



<213> OrganismName : Escherichia 
<4 00> PreSequenceString : 
MYQFTHQKSR I PKKTLLAAC CALFYSSNGA 

55 TPPGLYNVRV FVNGQATSSL EIPFVDIGEN 
EEEDCLDLAK SYEKADVCFD GSDQFLDLTI 
LNAYHTS SDN DNSDSVYGAF NSGINLGAWH 
IRSQIIMGDA YTTGETFDSV NVRGVRLYSD 
GYKIYETTVP PGEFVIDDIS PSGFGSELW 

60 DFSAGKVIDD SLRSEPNMGQ ASYYYGLNNL 
AVDVTHS RAE IPDDKTYQGQ SYRVTWNKLF 
DAKHLSADED KNTMQTYSRM KNQFTVSINQ 
YNVGYSKSVS WGSFSVNLQR SWNEDGEKDD 
NTDFDGSHQL NVNSSGNTEN NLVNYSVNAG 

65 SAT SDNS QQY SISTDGGFVL HSGGLTFTNN 
WGYAVTSSVS PYRENRVGLN IETLENDVEL 
I TAANGKS I P FAAEVYQGEV MIGSMGQGGQ 



ASLTVNSNLT MGTCSAQIMD NSNKVINEW 60 
NSAQIVLAPN GISCAGSQSS SAGFSNKFTD 120 
IPVPVTLPAD TTTQPYDYPL SARMTVAEGR 180 

201 



COli 0157 :H7 

KIHPGGEQHI KEFGISSNTE IITGGYQYIEM ■ 60 

VLQVYGAAND PTIKGGRLIV EKDGITVLAA 12 0 

RVMEVFGTNR LGQFEIKNGI ANNMLLENGG 180 

GVDKKAGGKL IVSTNALEVS GTNSKGQFS I 240 

EHATMQSLGK DTGTRVQANA VYDLGRSDQN 3 00 

VSVRGNDGIL EVMKPQINYA PAMLVGKVW 3 60 

DITTTNQNTR LNLANLAMSG ANVIMMDESV 420 

DMANHQSDQL NVTGQATGDF KIFVTDTGAS 48 0 

GTYEYTLLDN GNHSWSLAEN RAQITPSTTD 540 

YDTAMWSSAI NTRNNVTTDA GAGFEQTLTG 60 0 

FDRGGKGNVD SYTLGAYAGW EHQNGAYVDG 660 

GAHVES GFRW VDGLWSVRPY LAFTGFTTDG 72 0 

MDLQNGTTLE PWLKAAVRQE YADSNQVKVN 780 

HLSVSYGNGA GVESPWNTQA GWWTF 83 6 



COli 0157:H7 

CYIERNAEGD YHMKISSTQL SLASQMVEVP 60 

HFLNTADPSL ISTYTTTNGA ALLKTTVPGI 120 

IPS TQTKWA Y EYSDQSWYLR FRLFITPEFK 180 

YIDNDSLKFF VDEPTCATVA LAQDQGNVSG 240 

YASKITVKLK AANKPSDATL VGKTTGSASG 3 00 

WSNNLLFLPF TAQLVPDGSG NAVGVGTF SG 360 

369 



COli 0157 :H7 

AADTVEYDSS FLMGTGAS T I DVKRYAQGNP 60 

S AAACL THKN LAQLHIKQPE QPVTLLAREG 120 

PQAYVLKSYG GYVDPSLWES GINAATLAYT 180 

FRARGNYNWT TDNGSDFDFQ DRYLQRDI PA 240 

SRMLPSALAS YAP T I RGVAN SNAKVTVTQS 3 00 

TIEEADGSKR TFTQPFSSW QMQRPGVGRW 360 

FTGYTGIQFT DNNYLAGLLG VGI NTS I GAF 42 0 

QDTGTSFNLA AYRYSTQDYL GLHDALVLID 480 

PLNIAYEDYG SLFISGSWTY YWAANNSRTE 540 

AMYVSVSVPI ENILGGKRKS SGFRNLNTQL 600 

YSLDKNAGDL ASVGGYLNYE SGLGGISASA 660 

SFSSNDTLVL INALGAKGAR INNSNNE I DR 720 

KSTSATTVPR SGSWLTRFE TDEGRSAVLN 78 0 

AF VRGINDS G ELIVRWYENN QTIDCKLHYQ 84 0 
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FPAQPQTQGS TNTLLL3STNLT CQVANH 866 
<212> Type : PRT 
<211> Length : 86S 

SequenceName : SEQ ID 29 

SequenceDe script ion : 

Sequence 



<213> OrganismNarae : Escherichia coli 0157:H7 
10 <400> PreSequenceS tring : 

MKFKRLLHSG IASLSLVACG VNAATDLGPA GDIHFSITIT TKACEMEKSD LEVDMGTMTL 60 
QKPAAVGTVL SKKDFTX ELK ECDGISKATV EMDSQSDSDD DSMFALEAGG ATGVALKI ED " 120 

DKGTQQVPKG SSGTPIEWAI DGETTSLHYQ ASYWVNTQA TGGTANALVN FSITYE 176 

15 <212> Type : PRT 

<211> Length : 17S 

SequenceName : SEQ ID 3 0 
SequenceDescription : 

20 Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<4 00> PreSequenceS tring : 

MKYNNIIFLG LCLGLTTYSA LSADSVIKIS GRVLDYGCTV SSDSLNFTVD LQKNSARQFP 60 
25 TTGSTSPAVP FQITLSECSK GTTGVRVAFN GIEDAENNTL LKLDEGSNTA SGLGIEILDG 120 
NMRPVKLNDL HAGMQWXPLV PEQNNILPYS ARLKSTQKSV NPGLVRASAT FTLEFQ 176 

<212> Type : PRT 
<211> Length : 176" 
30 SequenceName> : SEQ ID 31 

SequenceDescription : 

Sequence 

35 <213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceS tring : 

MKWRKRGYLL AAILALAlSAT IQAADVTITV NGKWAKPCT VSTTNATVDL GDLYSFSLMS 60 
AGAASAWHDV ALELTNCX>VG TSRVTASFSG AADSTGYYKN QGTAQNIQLE LQDDS GNTLN 120 
TGATKTVQVD DSSQSAHFPL QVRALTVNGG ATQGTIQAVI SITYTYS 167 
40 <212> Type : PRT 

<211> Length : 167 

SequenceName : SEQ ID 32 

SequenceDescription : 

45 Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceS tring : 

MKRAPLITGL LLISTSCAYA SSEGCGADST S GATNYS SW DDVTVNQTDN VTGRE FT SAT 60 
50 LSSTNWQYAC S C S AGKAJVKL VYMVSPVLTT TGHQTGYYKL NDSLDIKTMN RPGNPGD 117 

<212> Type : PRT 
<211> Length : 117 

SequenceName : SEQ ID 33 
55 SequenceDescription : 

Sec[uence 

<213> OrganismName : Escherichia coli 0157:H7 

60 <400> PreSequenceS tring : 

MKKALLAAAL VMASGSALAV DGGHIDFNGM VQSGTCKVGV VDTGMHSVTT DGWTLDTAN 60 
VTDTFAEVSA TAVGLLPKEF MISVECDPGA PKNAELTMGS ASYANTSGTL NNNMNITVNG 120 
IAPAQNVNIA VHNMKNKAGA AE I KQVHMNN SSEVQELTLD AEGKGQYVFN ASYVKAPNSP 180 
AVTAGHVTTN ALYTVAYK 198 

65 <212> Type : PRT 

<211> Length : 198 

SequenceName : SEQ ID 34 
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SequenceDescription : 
Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MKPNMIVGAL ALTSVFMAGH LQAADGTVHF RGEIIDSTCE VTPETKDQW DLGKVNRTAF 60 
SGVDDVAAPT AFSIDLTQCP ETFKSAAIRF DGNEDAHGNG NLAIGTPLDN SNDAAAGISP 12 0 

SDNSGDYTGA GAVSAAKGVA IRLYNRADNT QVKLYENSAS TPISNGNASM KFMARYIATE 18 0 

TTIDPGTANA DSQFTVEYIK 2 00 

<212> Type : PRT 
<211> Length : 200 

SequenceName : SEQ ID 35 

SsqusnceDescription : , 



Sequence 



<213> Organ israName : Escherichia coli 0157 :H7 
<4 00> PreSequenceString : 

MPIFQREGHL KYSFAAGEYQ AGNYDSASPR FGQLDLIYGL PWGMTAYGGV LISNNYNAFT 60 

LGIGKNFGYI GAISIDVTQA KSELNNDRDS QGQSYRFLYS KSFESGTDFR LAGYRYSTSG 12 0 

FYTFQEATDV RSDADSDYNR YHKRSEIQGN LTQQLGAYGS VYLNLTQQDY WNDAGKQNTV 18 0 

SAGYNGRIGK VSYSIAYSWN KSPEWDESDR LWSFNISVPL GRAWSNYRVT TDQDGRTNQQ 240 

VGVSGTLLED RNLSYSVQEG YASNGVGNSG NANVGYQGGS GNVNVGYSYG KDYRQLNYSV 3 00 

RGGVIVHSEG VTLSQPLGET MTLISVPGAR NARWNNGGV QVDWMGNAIV PYAMPYRENE 3 60 

ISLRSDSLGD DVDVENAFQK WPTRGAIVR ARFDTRVGYR VLMTLLRSAG SPVPFGATAT 420 
LITDKQNEVS SIVGEEGQIiY ISGMPEEGRV LIKWGNDASQ QCVAPYKLSL ELKQGGIIPV . 480 

SANCQ 485 
<212> Type : PRT 
<211> Length : 485 

SequenceName : SEQ ID 3 6 

SequenceDescription : 



Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<4 00> PreSequenceString : 

MSGYTVKPPT GDSNEQTQFI DYFNLFYSKR DQEQISISQQ LGNYGATFFS ASRQSYWNTS 60 
RSDQQISFGL NVPFGDITTS LNYSYSNNIW QNDRDHLLAF TLNVPFSHWM RTDSQSAFRN 12 0 

SNASYSMSND LKGGMTNLSG VYGTLL PDNN LNYSVQVGNT HGGNTSSGTS GYS TLNYRGA 180 
YGNTNVGYSR SGDSSQIYYG MSGGIIAHAD GITFGQPLGD TMVLVKAPGA DNVKIENQTG 240 
IHTDWRGYAI LPFATEYREN RVALNANS LA DNVELDETW TVIPTHGAIA RATFNAQIGG 3 00 

KVLMTLKYGN KSVPFGAIVT HGENKNGS IV AENGQVYLTG LPQSGKLQVS WGNDKNSNCI 3 60 

VDYKLPEVSP GTLLNQQTAI CR 382 
<212> Type : PRT 
<211> Length : 382 

SequenceName : SEQ ID 37 . 

SequenceDescription : 



Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<4 00> PreSequenceString : 

MSALYERSQL TQVMISSAPA TAETMEKAEY LRLDCTIKEV QFTAGQKQDI DVTTLCSTEQ 60 

ENINGLGASS EISMSGNFYL NQAQNALRDA YDNDTVYAFK VQFPSGKGFK FLAEVRQHTW 120 

SSGTNGWAA TFSLRLKGKP VSYWPLAFV KNLDKTLTVN TGALLTMSVS VNGGTPPYKH 18 0 

AWKKDGQPVE GQTTDTFSKP GAQSGDKGAY TCEVTDSAEQ PQSITSDACT VTVNGAGG 23 8 

<212> Type : PRT 
<211> Length : 238 

SequenceName : SEQ ID 3 8 

SequenceDescription : 



Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<4 00> PreSequenceString : 
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35 



MRNKPFYLLC AFLWLAVSHA LAADSTITIR GYVRDNGCSV AAESTNFTVD LMENAAKQFN 60 
NTGATTPWP FRILLS PCGNT A.VS AVKVGF T GVADSHNANL LALENTVSAA SGLGIQLLNE 12 0 

QQNQIPLNAP SSAISWTTLT E>GKPNTLNFY ARLMATQVPV TAGHINATAT FTLEYQ 176 

<212> Type : PRT 
<211> Length : 176 

SequenceName : SEQ ID 39 

SequenceDescriptzLon : 



10 Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceStrincj : 

MNKSWSISA AMLVLLCQPV MGSEISPATP • SDEDNYTFD'F QLFRGSRFSQ SSLAKLTTRE 60 

15 SVAPGNYKMD IYTNNKLSGS WNVTFKEAAD GRVLPCLTPE VADA I GLKTG EDKGEKDPVC 120 

TFAKELAPGI TSQTQLSQLR LDLSVPQSQL ISRPRGYVPP SELDTGASLA FMNYIANYYN 18 0 

VAYSGQNAHS QRSLWASFNG GINLGAWQYR QLSNMTWDND KGNQWNNIRS YLQRPLPAIN 240 

SQLMMGQLIT SGRFFSGLSY HGVS LATDER MLPDSMRGYA PTIRGVAATM ARVSVMQNGH 3 00 

EIYQTTVAPG PFEINDLYPT SYSGDLDVTV TEANGAVSRF SVPFSAVPES MRPGTSRYNV 3 60 

20 EVGKTQDSGD DSMFGDLTWQ HGMTNTLTFN SGSRIADGYQ ALMLGGVYGS SLGAFGANLT 42 0 

WSHARVPESE AQSGWMSQLT WSKTFQPTST TVS LAG YRYS TSGYRDLADV LGERHAASNK 480 

QSWDSSQWRQ QSRFDLTLSQ SLANYGNLFV SGSTQNYRGG KSRDTQLQLG YSNSFSHGIS 540 

MNLSVGRQRM GGYKDNSDDM QTVTSLSFSF PLGGNGPRVP SLSNSWTHST DGSSQLQSSL 600 

TGMLDEAQTT NYSLNVMRDQ QYKQTTLSGN MQKRFSQTTV GLNASKGQDY WQASGNVQGA 660 

25 MAVHGGGITF GPYLGETFAL VEAKGAEGAK VYNSSQLEIN DSGYALVPAV TPYRYNRISL 720 

DPQGMDGDAE LVDSERQVAP VAGAAVKVIF RTRPGKALL I KSRMADGSEL PMGADVLDEN 780 

NTWGIAGQG GQIYLRTEQT KGHLSVRWGE GANDSCQLPF DISGKDSNSP IIRLNETCQS 840 

<212> Type : PRT 
30 <211> Length : 840 

SequenceName :.. SEQ ID 40 
SequenceDe script ion : 



Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString- : 

MKLAACFLTL LPGFAVAASW TSPGFPAFSE QGTGTFVSHA QLPKGTRPLT LNFDQQCWQP 60 

ADAIKLNQML SLQPCSNTPP QWRLFRDGKY TLQIDTRSGT PTLMISIQNA AEPVANLVRE 12 0 

40 CPKWDGLPLT LDVSATFPEG AAVRDYYSQQ IAIVKNGQIT LQPAATSNGL LLLERAETDA 18 0 

SAPFDWHNAT VYFVLTDRFE 1STGDPSNDQSY GRHKDGMAE I GTFHGGDLRG LTNKLDYLQQ 24 0 

LGVNALW I S A PFEQIHGWVG GGTKGDFPHY AYHGYYTQDW TNLDANMGNE ADLRTLVDSA 3 00 

HQRGIRILFD WMNHTGYAT LADMQEYQFG ALYLSGDEVK KTLGERWSDW KPAAGQTWHS 3 60 

FNDYINFSDK TGWDKWWGKN WIRTDIGDYD NPGFDDLTMS LAFLPDIKTE STTASGLPVF 420 

45 YKNKTDTHAK AIDGFTPRDY LTHWLSQWVR DYGIDGFRVD TAKHVEL PAW QQLKTEASAA 48 0 

LREWKKANPD KALDDKPFWM TGEAWGHGVM QSDYYRHGFD AMINFDYQEQ AAKAVDCIAQ 540 

MDTTWQQMAE KLQGFNVLSY LSSHDTRLFR EGGDKAAELL LLAPGAVQIF YGDESSRPFG 60 0 

PTGSDPLQGT RSDMNWQDVS GKSAANVAHW QKISQFRARH PAIGAGKQTT LSLKQGYGFV 660 

REHGDDKVLV IWAGQQ 676 

50 <212> Type : PRT 

<211> Length : 676 

SequenceName : SEQ ID 41 
SequenceDescript ion : 

55 Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceStrincj : 

MPQRHHQGHK RTPKQLALII KRCLPMVLTG SGMLCTTANA EEYYFDPIML ETTKSGMQTT 60 

60 DLSRFSKKYA QLPGTYQVDI WLNKKKVSQK KITFTANAEQ LLQPQFTVEQ LRELGIKVDE 12 0 

I PALAEKDDD SVINSLEQII IPGTAAEFDFN" HQRLNLSIPQ IALYRDARGY VSPSRWDDGI 180 

PTLFTNYSFT GSDNRYRQGN RSQRQYLNMQ NGANFGPWRL RNYSTWTRND QASSWNTISS 240 

YLQRD I KALK SQLLLGESAT SGSIFSSYNF TGVQLASDDN MLPNSQRGFA PTVRGIANSS 300 

AIVTIRQNGY VIYQSNVPAG AFEINDLYPS SNSGDLEVTI EESDGTQRRF IQPYSSLPMM 3 60 

65 QRPGHLKYSA TAGRYRADAN SDSKEPEFAE ATAIYGLNNT FTLYGGLLGS EDYYALGIGI 420 

GGTLGALGAL SMDINRADTQ FDNQHSFHGY QWRTQYIKDI PETNTNIAVS YYRYTNDGYF 480 

SFDEANTRNW DYNSRQKSEI QFNISQTIFD GVSLYASGSQ QDYWGNNEKN RNISVGVSGQ 540 
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QWGIGYSLNY QYSRYTDQNTff DRALSLNLSI PLERWLPRSR VSYQMTSQKD RPTQHEMRLD 600 

GSLLDDGRLS YSLEQSLDDD NNHNSSVNAS YRSPYGTFSA GYSYGNDSSQ YNYGVTGGW 660 

IHPHGVTLSQ YLGNAFALXD ANGASGVRIQ NYPGIATDPF GYAWPYLTT YQENRLSVDT 720 

TQLPDNVDLE QTTQ FWP3STR GAMVAARFNA NIGYRVLVTV SDRNGKPLPF GALASNDDTG 78 0 

5 QQSIVDEGGI LYLSGISSKS QSWTVRWGNQ ADQQCQFAFS TPDSEPTTSV LQGTAQCH 83 8 



<212> Type : PRT 
<211> Length : 838 

SequenceMatne r SEQ ID 42 
10 SequenceDescription : 

Sequence 



<213> Organism&ame z Escherichia coli 0157;H7 

15 <400> Pre Sequence St ring : 

MMFRNRILLI FILWANFTWA GCRTTASLNI TDGINVGEIL ANETSFSKSV VFTGISCDTS 60 
TDKIVYKNIQ SDWVEVGPFG NGEKLKVKIE SLGKTSDTIG KSSNAQAVLP YWKIARGTP 12 0 

DFTGERKSTW FISDTVIANT GGESSSSIDF WLGICKALKF NWCVNYLTSK LAGDTFTLGL 180 
NISYYPKNTT CKPENTVIKV DDIALFQLRN QGKIAANSKE GTITLKCDNL FGDKKQASRN 24 0 

20 MWYLSSSDL VKGSNTILRG KTDNGVGFVL DLTEPPKGTE AAIKISANGD QGAATSLWKT 3 00 

DKPGVSLNSN IINIPVMASY YVYDEKKVKS GAL EAT AL I N VKYD 344 
<212> Type : PRT 
<211> Length : 344 

SequenceName z SEQ ID 43 

25 SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 

30 <400> PreSequenceString : 

MIKKASLLTA CSVTAFSAWA QDTSPDTLW TANRFEQPRS TVLAPTTWT RQDIDRWQST 60 

SVNDVLRRLP GVDITQNGGS GQLSSIFIRG TNASHVLVLI DGVRLNLAGG SGSADLSQFP 12 0 

IALVQRVEYI RGPRSAVYGS DAIGGWNII TTRDEPGTE I SAGWGSNSYQ NYDVSTQQQL 18 0 

GDKTRVTLLG DYAHTHGYDV VAYGNTGTQA QPDNDGFLSK TLYGALEHNF TDAWSGFVRG 240 

35 YGYDNRTNYD AYYSPGSPLV DTRKLYSQSW DAGLRYNGEL IKSQLITSYS HS KDYNYDPH 3 00 

YGRYDSSATL DEMKQYTVQW ANNIIIGHGN VGAGVDWQKQ STAPGTAYVK DGYDQRNTGI 36 0 

YLTGLQQVGD FTFEGAARSD DNSQFGRHGT WQTSAGWEFI EGYRFIASYG TSYKAPNLGQ 42 0 

LYGFYGNPNL DPEKSKQWEG AFEGLTAGVN WRI S GYRND V SDLIDYDDHT LKYYNEGKAR 48 0 

IKGVEATANF DTGPLTHTVS YDYVDARNAI TDTPLLRRAK QQVKYQLDWQ LYDFDWGITY 540 

40 QYLGTRYDKD YSSYPYQTVK MGGVSLWDLA VAYPVTSHLT VRGKIANLFD KDYETVYGYQ 60 0 

TAGREYTLSG SYTF 614 
<212> Type : PRT 
<211> Length : 614 

SequenceName = SEQ ID 44 



45 SequenceDescription : 

Sequence 



<213> Organ ismName : Escherichia coli Q157:H7 
50 <400> PreSequenceS tiring : 

MKNKLL FMML TILGAPGIAA AAGYDLANSE YNFAVNELSK SSFNQAAIIG QAGTNNSAQL 60 
RQGGSKLLAV VAQEGSSNRA KIDQTGDYNL AYIDQAGSAN DAS ISQGAYG NTAMI IQKGS 120 
GNKANITQYG TQKTAIWQR QSQMAIRVTQ R 151 
<212> Type : PRT 
55 <211> Length : 151 

SequenceName = SEQ ID 45 
SequenceDescription : 

Sequence 
60 

<213> OrganismName - Escherichia coli 0157:H7 
<400> PreSequenceString : 

MN I FAYLLVL VFSMSMSSSA FASWMTGTR IIFPGDAKEK TIQLRNTSDQ PYIINIHVED 60 

ERGSDKNVPF MPTPQTFRME AAAGQALRLL YTGNNLPQDR ESVFWFSFSQ LPYLNKNDKS 120 

65 QNQLILALTN RVKIFYRPSS IVGKSSDAPK NLTYQVKQNR IEVTNPTGYY VTIRAAELLN 180 

NGKKVPLANS VMIAPQSTTE WTLPSGISVA PGAQIHLVTV NDYGVNVTSE HAL 233 
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<212> Type : PRT 

<211> Length : 233 

SequenceName : SEQ ID 46 
SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<40 0> PreSequenceString : 

MKRLHKRFLL AT F CALL TAT LQAADVTITV NGRWAKPCT IQTKEANVNL GDLYTRNLQQ 60 
PGSASGWHNI TLSLTDCPAE TSAVTAIVTG STDNTGYYKN EGTAENIQIE LRDDQDATLK 12 0 

NGDSKTVIVD EITRNAQFPL KARAITVNGN ASQGTIEALI NVIYTWQ 167 
<212> Type : PRT 

<.211> 'Length 167 . 
SequenceName : SEQ ID 47 
SequenceDescription : 

Sequence 

<213> OrganismName : Escherichia coli 0157:H7 
<4 00> PreSequenceString : 

MRAKLLGIVL TTPIAISSFA STETLSFTPD NINADISLGT LSGKTKERVY LAE EGGRKVS 60 
QLDWKFNNAA IIKGAINWDL MPQISIGAAG WTTLGSRGGN MVDQDWMDSS NPGTWTDESR 12 0 

HPDTQLNYAN EFDLNIKGWL LNEPNYRLGL MAGYQES RYS FTARGGSYIY S S EEGFRDD I 18 0 

GSFPNGERAI GYKQRFKMPY IGLTGSYRYE DFELGGTFKY SGWVEASDND EHYDPGKRIT 240 
YRSKVKDQNY YSVSVNAGYY VTPNAKVYVE GTWNRVTNKK GNTSLYDHND NTSDYSKNGA 3 00 

GIENYNFITT AGLKYTF 317 
<212> Type : PRT 
<211> Length : 317 

SequenceName : SEQ ID 48 

SequenceDe script ion : 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<40 0> PreSequenceString : 

MFFKRGKILS AGRLNKKSLG IVMLLSVGLL LAGCSGSKSS DTGTYSGSVY TVKRGDTLYR 60 
ISRTTGTSVK ELARLNGISP PYTIEVGQKL KLGGAKSSSS TRKSTAKSTT KTASVTPSSA 12 0 

VPKSSWPPVG QRCWLWPTTG KVIMPYSTAD GGNKGIDISA PRGTPIYAAG AGKWYVGNQ 180 
LRGYGNL I MI KHSEDYITAY AHNDTMLVNN GQSVKAGQKI ATMGSTDAAS VRLHFQIRYR 240 
ATAIDPLRYL PPQGSKPKC 2 59 

<212> Type : PRT 
<211> Length : 259 

SequenceName : SEQ ID 49 

SequenceDe script ion : 

Sequence 

<213> OrganismName : Escherichia coli 0157 :H7 
<40 0> PreSequenceString : 

MPTPNPLAPV KGAGTTLWVY NGNGDP YANP LSDNDWSRLA KVKDLTPGEL TAESYDDSYL 60 
DDEDADWAAT GQGQKSAGDT SFTLAWMPGE QGQQALLAWF NEGDTRAYKI RFPNGTVDVF 120 
RGWVSSIGKA VTAKEVITRT VKVTNVGRPS MAEDRS TVTA ATGMTVTPAS TSWKGQSTT 180 
LTVAFQPEGA TDKSFRAVSA DKTKATVSVS GMT I TVKGVA AGKVNIPWS GNGE FAAVAE 24 0 

INVTAS 246 
<212> Type : PRT 
<211> Length : 246 

SequenceName : SEQ ID 50 

SequenceDescription : 

Sequence 

<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

MSALYERSQL TQVMISSAPA TAETMDKAEY LRLDCTIKEV QFTAGQKQDI DVTTLCSTEQ 60 
ENINGLGASS EISMSGNFYL NQAQNALRDA YDNDALYAFK VLFPSGKGFK FLAEVRQHTW 120 
SSGTNGWAA TFSLRLKGKP VSFWPLAFV KNLDKTLTVN TGALLTMSVS ANGGTPPYKY 180 



25 



35 



50 
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AWKKDGQPVD GQTTDTFSKP GAQSADAGKY TCWTDSAEK AQSVTSVECT VTVSAAAG 23 8 

<212> Type : PRT 
<211> Length : 238 

SequenceName : SEQ ID 51 

SequenceDescription : 

Sequence 



10 <213> OrganismName : Escherichia coli Q157:H7 
<400> PreSequenceString : 

MKKSTLALW MGIVASASVQ AAEIYNKDGN KLDVYGKVKA MHYMSDNDSK DGDQSYIRFG 60 

FKGETQINDQ LTGYGRWEAE FAGNKAESDT AQQKTRLAFA GLKYKDLGSF DYGRNLGALY 12 0 
DVEAWTDMFP EFGGDSSAQT DNFMTKRASG LATYRNTDFF GVIDGLNLTL QYQGTOsFENRD „ ,180- 

15' VKKQNGDGFG TSLTYDFGGS DFAISGAYTN SDRTNEQNLQ SRGTGKRAEA WATGLKYDAN 240 

NIYLATFYSE TRKMTPITGG FANKTQNFEA VAQYQFDFGL RPSLGYVLSK GKDIEGIGDE 3 00 

DLVN YI D VGA TYYFNKNMSA FVDYKINQLD SDNKLNINND DIVAVGMTYQ F 3 51 

<212> Type : PRT 
20 <211> Length : 351 

SequenceName : SEQ ID 52 
SequenceDescription : 



Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

MRVKHAWLL MLISPLSWAG TMTFQFRNPN FGGNPNNGAF LLNSAQAQNS YKDPSYNDDF 60 
GIETPSALDN FTQAIQSQIL GGLLSNINTG KPGRMVTNDY IVDIANRDGQ LQLNVTDRKT 12 0 

30 GQTSTIQVSG LQNNSTDF 138 
<212> Type : PRT 
<211> Length : 138 

SequenceName : SEQ ID 53 
SequenceDescription : 



Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

40 MKRKVLAMLV PALLVAGAAN AAEIYNKDGN KLDLYGKVAG LHYFSDDASS DGDMSYARIG 60 
FKGETQIADQ FTGYGQWEFN IGANGPESDK GNTATRLAFA GFGFGQNGTF DYGRNYGWY 120 
DVEAWTDML P EFGGDTYAGA DNFMNGRANS V.A.T YRNNGF F GQVD GLNFAL QYQGNNEKSG 18 0 

LFDQEGSGNG NGRKLAKENG DGSVCPLPMT LTLV 214 
<212> Type : PRT 

45 <211> Length : 214 

SequenceName : SEQ ID 54 
SequenceDescription : 



Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MNTVTLEGGT FNNNGTLNDV VKIEKNSNAV IJSTNTGSLSTL QLHDGTVNNS GIASARVNAQ 60 

GDAVFNNLAG GEARKGAILY NSAWNNAGT WKMGYQDENN NAGTLDIDDK STFNNSGKLI 12 0 

55 LDNSKNAIRF QGSNANATLY NTGEMTLDAA JL» GAGAI L YDD GASEFINKGV VDAKVTVAVS 180 

TAGATESDAF LWNQDGGVIN FDKDNASAVK FTHNNYVALN DGVMNISGNN AVAMEGDKNA 240 

QLVNNGVINL GTEGTTDTGL TGMQLDANAT ADAVIENNGT INI FANDS FA FSVLGTEGHI 3 00 

VNNGTWIAD GVTGSGLIKQ GDSVNVEGVN G3STSGNNTEVH YTDYTLPDMP NTYTTSPFSE 3 60 

TTDSGSSDGS SNNLNGYIVG TNVDGSAGKL KVNN A S MNGV GINTGFAAGT ADTTVS FDNV 42 0 

60 VEGINLTDAD AITSTSWWT AKGSTDASGN VTDVIMSKNAY TDVATDASVN DVAKALDAGY 480 

TNNELYTSLN VGTTAELNSA LKQVSGSQAT TVF REARVL S NRFSMLADAA PKVGNGLAFN 540 

WAKGDPRAE LGNNTEYDML ALRKTVDLSE SQSMSLEYGI ARLDGDGAQK AGDNGVTGGY 600 

SQFFGLKHQM SFDNGMRWNN ALRYDVHNLD SSRSVAYGDV SKTADTDVKQ Q YLELRS EGA 660 

KTFEPREGLK ITPYAGVKLR HSLEGGYQER HAGDFNLSMN SGSETAVDSI VGLKLDYAGK 720 

65 GGWSANATLE GGPNLSYSKS QRTASLAGAG SQHFNVDDGQ KGGGINSLAS VGVKYSSKES 780 

SLNLDAYHWK EDGI SDKGVM LNFKKTF 807 
<212> Type : PRT 
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20 



40 



60 



65 



<211> Length : 807 

SequenceName : SEQ ID 55 
SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> Pre Sequences t ring : 

MLNGISNAAS TLGRQLVGIA SRVSSAGGTG FSVAPQAVRL TPVKVHSPFS PGSSNVNART 60 
10 IFNVSSQVTS FTPSRPAPPP PTSGQASGAS RPLPPIAQAL KEHLAAYEKS KGPEALGFKP 120 
ARQAPPPPTS GQASGASRPL PPIAQALKEH LAAYEKSKGP EALGFKPARQ APPPPTSGQA 180 
SGASRPLPPI AQALKEHLAA YEKS KGPEAL GFKPARQAPP PPTGPSGLPP LAQALKDHLA 240 
AYEQSKKG 248 
<212> Type : PRT * 
15 <211> Length : 248 

SequenceName : SEQ ID 56 
SequenceDescription = 



Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

MNKKIHSLAL LVNLGIYGVA QAQEPTDTPV SHDDTIWTA AEQNLQAPGV STITADEIRK 60 

NPVARDVSEI IRTMPGVNLT GNSTSGQRGN NRQIDIRGMG PENTLILIDG KPVSSRNSVR 12 0 

25 QGWRGERDTR GDTSWVPPEM IERIBVLRGP AAARYGNGAA GGWNIITKK GSGEWHGSWD 18 0 

AYFNAPEHKE EGATKRTNFS LTGPLGDEFS FRLYGNLDKT QADAWDINQG HQSARAGTYA 240 

TTLPAGREGV INKDINGWR WDFAPLQSLE LEAGYSRQGN LYAGDTQNTN SDAYTRSKYG 3 00 

DETNRLYRQN YSLTWNGGWD NGVTTSNWVQ YEHTRNSRIP EGLAGGTEGK FNEKATQDFV 3 60 

DNDLDDVMLH SEVNLPIDFL VNQTLTLGTE WNQQRMKDLS SNTQALTGTN TGGAIDGVSA 420 

30 TDRSPYSKAE IFSLFAENNM ELTDSTIVTP GLRFDHHS IV GNNWSPALNI SQGLGDDFTL 48 0 

KMGIARAYKA PSLYQTNPNY ILYSKGQGCY ASAGGCYLQG NDDLKAETSI NKEIGLEFKR 540 

DGWLAGITWF RNDYRNKIEA GYVAVGQNAV GTDLYQWDNV PKAWEGLEG SLNVPVSETV 600 

MWTNNITYML KSENKTTGDR LSIIPEYTLN STLSWQARED LSMQTTFTOY GKQQPKKYNY 660 

KGQPAVGPET KEISPYSIVG LSATWDVTKN VSiLTGGVDNL FDKRLWRAGN AQTTGDLAGA 720 

35 NYIAGAGAYT YNEPGRTWYM SVNTHF 746 
<212> Type : PRT 
<211> Length : 746 

SequenceName : SEQ ID 57 
SequenceDescription : 



Sequence 



<213> OrganismName : Escherichia coli 0157:H7 

<400> PreSequenceString : 
45 MGGRFSLRYK KLSYRFVFLT LAGCSSVGNQ SLKNETQESV KTKIVKGKTT KQDVLASFGE 60 

PDSRSLIDGE EQWSYTMYNS QSKATSFIPV VGLLAGGADS QTKSLTVSFK GEKVSTYIFN 12 0 

AGTSNVKTGI F 131 

<212> Type : PRT 

<211> Length : 131 
50 SequenceName : SEQ ID 58 

SequenceDescription : 

Sequence 



55 <213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MKKIACLSAL AAVLAFTAGT SVAATSTVTG GYAQSDAQGQ MNKMGGFNLK YRYEEDNS PL 60 

GVIGSFTYTE KSRTASSGDY NKNQYYGITA GPAYR INDWA S I YGWGVGY GKFQTTEYPT 12 0 

YKHDTSDYGF SYGAGLQFNP MENVALDFSY EQSRIRSVDV GTW I AGVGYR F 171 



<212> Type : PRT 

<211> Length : 171 

SequenceName : SEQ ID 59 
SequenceDescription : 

Sequence 
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<213> OrganismName : Escherichia coli Ol57:H7 
<400> PreSequenceString : 

MKSIATLWC AISGIACVNL SAHAAEGEHT I SLGYAHFQF P GL KD F VKD A TAHNRETFSH 60 
FVNRNYFSSL GEYTDGRVSG YEGKDKNPQG INIRYRYEIT D3DFGVITSFT WTRSLTNSQT 120 
FIDVQSADHT RKI KNPAAS A RTDIRANYWS LLAGPSWRVN QYMSLYAMAG MGVAKVSADL 180 
KIKDNINSSG GFSESNSTKK TSLAWAAGAQ FNLNESVTLD VAYEGSGSGD WRTSGVTAGI 240 
GLKF 

<212> Type : PRT 
<211> Length : 244 

SequenceName : SEQ ID 60 

SequenceDescription : 

Sequence 



244 



<213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MRKLYAAILS AAICLTVSGA PAWASEQQAT LSAGYLHVST KTAPGSDNLNG INVKYRYEFT 6 0 

DTLGLVTSFS YAGDRNRQ I T RYSDTRWHED SVRNRWFSVM A.GP S VRVNE W FSAYAMAGVA 12 0 

YSRVSTFSGD YLRVTDNKGK THDVLTGSDD GRHSNTSLAW GAGVQFNPTE SVAIDIAYEG 18 0 

SGSGDWRTDG FIVGVGYKP 1" 
<212> Type : PRT 
<211> Length : 199 

SequenceName : SEQ ID 61 

SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<40 0> PreSequenceString : 

MRKLYAAILS AAICLAVSGA PAWASEQQAT L S AGYLHART SAPGSDNLNG INVKYRYEFT 60 
DTLGLVTSFS YAGDKNRQLT RYSDTRWHED SVRNRWFSVM A.GPSVRVNEW FSAYAMAGVA 12 0 

YSRVSTFSGD YLRVTDNKGK THDVLTGSDD GRHSNTSLAW GAGVQFNPTE SVAIDIAYEG 180 
SGSGDWRTDG FIVGVGYKF 199 
<212> Type : PRT 
<211> Length : 199 

SequenceName : SEQ ID 62 

SequenceDescription : 



Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MRKLYAAILS AAICLAVSGA PAWASEQQAT L S AGYLHART SAPGSDNLNG INVKYRYEFT 60 
DTLGLVTSFS YAGDKNRQLT RYSDTRWHED SVRNRWFSVM AGPSVRVNEW FSAYAMAGVA 12 0 

YSRVSTFSGD YLRVTDNKGK THDVLTGSDD GRHSNTSLAW GAGVQFNPTE SVAIDIAYEG 180 
SGSGDWRTDG FIVGVGYKF 199 
<212> Type : PRT 
<211> Length : 199 

SequenceName : SEQ ID 63 

SequenceDescription : 



Sequence 



<213> OrganismName : Escherichia coli 0157:H*7 
<400> PreSequenceString : 

MRKLYAAILS AAICLAVSGA PAWASEQQAT L S AGYLHART SAPGSDNLNG INVKYRYEFT 60 
DTLGLVTSFS YAGDKNRQLT RYSDTRWHED SVRNRWFSVM AGPSVRVNEW FSAYAMAGVA 12 0 

YSRVSTFSGD YLRVTDNKGK THDVLTGSDD GRHSNTSLAW GAGVQFNPTE SVAIDIAYEG 18 0 

SGSGDWRTDG FIVGVGYKF 199 
<212> Type : PRT 
<211> Length : 199 

SequenceName : SEQ ID 64 

SequenceDescription : 



Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
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<4 00> Pre Sequenc eSt ring : 

MVMSQKTLFT KSALAVAVAL ISTQAWSAGF QLNEFSSSGL GRAYSGEGAI ADDAGNVSRN 60 
PALITMFDRP TFSAGAVYID PDVNISGTSP SGRSLKADNI APTAWVPNMH FVAPINDQFG 12 0 

WGASITSNYG LATEFNDTYA GGSVGGTTDL ETMNXiNLSGA YRLNNAWSFG LGFNAVYARA 180 
5 KIERFAGDLG QLVAGQIMQS PAGKTPQGQA LAATANGIDS NTKIAHLNGN QWGFGWNAGI 240 
LYELDKNNRY ALTYRSEVKI DFKGNYSSDL NRVFIsTNYGL P IPTATGGATQ SGYLTLNLPE 3 00 

MWEVSGYNRV DPQWAIHYSL AYTSWSQFQQ LKATSTSGDT LFQKHEGFKD AYRIALGTTY 360 
YYDDNWTFRT GIAFDDSPVP AQNRSISIPD QDRFWLSAGT TYAFNKDASV DVGVSYMHGQ 42 0 

SVKINEGPYQ FESEGKAWLF GTNFNYAF 448 
10 . <212> Type : PRT 

<211> Length : 448 

SequenceName : SEQ ID 65 

SequenceDescription : 

15 Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<40 0> PreSequenceString : 

MAFSQAVSGL NAAATNLDVI GNNIANSATY GFKSGTASFA DMF AGS KVGL GVKVAGITQD 60 
20 FTDGTTTNTG RGLDVAISQN GFFRLVDSNG SVFYSRNGQF KLDENRNLVN MQGLQLTGYP 12 0 

ATGTPPTIQQ GANPTNISIP NTLMAAKTTT TASMQINLNS SDPLPSVNAF DASNADSYNK 180 
KGSVTVFDSQ GNAHDMSVYF V KTGDNNWQ V YTQDSSDPTG TAEPAMKLVF NANGVLTSNP 240 
TENITTGAIN GAEPATFSLS FLNSMQQNTG ANNIVATTQN GYKPGDLVSY QINDDGTWG 3 00 

NYSNEQTQLL GQIVLANFAN NEGLAS EGDN VWSATQSSGV ALL GTAGTGN FGTLTNGALE 3 60 

25 ASNVDLSKEL VNMIVAQRNY QSNAQTIKTQ DQILETTLVNL R 4 01 

<212> Type : PRT 
<211> Length : 401 

SequenceName : SEQ ID 66 
SequenceDescription : 

30 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

35 MSKSTFLHIL ISSIILVALI QSSAWANCTN TQIGQTEDGR TALIEFGKIN MTDTYFAPAG 60 

SLLATTWPP TNYTSGGATG SSVLWECDAT DLPEFIYFLVA TNGDDRVGGF YDAGGPDGLS 12 0 

DVYATWFAFV GLKQTMAGVT LGRYWKKVP I TSYA-TQGTKI QIRLQDIPPL HAELYRISTL 18 0 

PDTSATTSWC GNNNTDSSGV GFAKPSGTIY NCVQPNAYIQ LSGTSGILFG HDEPGEDSSV 240 

HWDFWGADNG FGYGMRSANR LYNNATCVAR SATPLVLLPT IAEAQLNAGM ESTGNFNVRV 3 00 

40 ECSNSVQSGI SDTQTALGIQ VSEGAYTAAQ KLGTINSNGG VSALVSDNYD AAEMAKGVGI 3 60 

YISNSAHPDT AMTLVGQPGI AKLTPGGNAA GWYPVFEGAT LEGATHPGYS SYSYSFIARL 42 0 

KKLPNQTVSA GKVRATAYIL VKMQ 444 
<212> Type : PRT 
<211> Length : 444 

45 SequenceName r SEQ ID 67 

SequenceDescription : 

Sequence 



50 <213> OrganismName : Escherichia coli 0157:H7 
<4 00> PreSequenceString : 

MENNRNFPAR QFHSLTFFAG LCIGITPVAQ ALAA.EGQTNA DDTLWEAST PSLYAPQQSA 60 

DPKFSRPVAD TTRTMTVISE QVIKDQGATN LTDALKNVPG VGAF FAGENG NSTTGDAIYM 120 

RGADTSNS I Y IDGIRDIGSV SRDTFNTEQV EVIKGPSGTD YGRSAPTGS I NM I S KQPRND 180 

55. SGIDASASIG SAWFRRGTLD VNQVIGDTTA VRLNVMGEKT HDAGRDKVKN ERYGVAPSIA 240 

FGLGTANRLY LNYLHVTQHN TPDGGIPTIG LPGYSAPSAG TATLNHSGKV DTHNFYGTDS 3 00 

DYDDSTTDTA TMRFEHDIND NTTIRNTTRW SRVKQDYLMT AIMGGASNIT QPTSDVNSWT 3 60 

WSRTANTKDV SNKILTNQTN LTSTFYTASI GHDVSTGVEF TRETQTNYGV NPVTLPAVNI 420 

YHPDSSIHPG GLTRNGANAN GQTDTFAIYA FDTLiQITRDF ELNGGI RLDN YHT E YDS ATA 480 

60 CGGSGRGAIT CPAGVAKGSP VTTVDTAKSG NLVNWKAGAL YHL TENGNVY INYAVSQQPP 540 

GGNNFALAQS GSGNSANRTD FKPQKANTSE IGTKWQVLDK RLLLTAALFR TDIENEVEQN 600 

DDGTYSQYGK KRVEGYE I S V AGNITPAWQV IGGYTQQKAT I KNGKD VAQD GSSSLPYTPE 660 

HAFTLWSQYQ ATDDISVGAG ARYIGSMHKG SDGAVGTPAF TEGYWVADAK LGYRVNRNLD 720 

FQLNVYNLFD TDYVASINKS GYRYHPGEPR TFLLTANMHF 760 

65 <212> Type : PRT 

<211> Length : 760 

SequenceName : SEQ ID 68 
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SequenceDescription : 
Sequence 



5 <213> OrganismName : Escherichia coli 0157:H7 
<4 00> PreSequenceString : 

MQMKKLLPIL IGLSLSGFSS LSQAENLMQV YQQARIiSNPE LRKSAADRDA AFEKINEARS 60 

PLLPQLGLGA DYTYSNGYRD ANGINSNATS ASLQLTQSIF DMSKWRALTL QEKAAGIQDV - 120 

TYQTDQQTLI LNTATAYFNV LNAIDVLSYT QAQKEAIYRQ LDQTTQRFNV GLVAITDVQN 18 0 

10 ARAQYDTVLA NEVTARNNLD NAVEQLRQIT GNYYPELAAL NVENFKTDKP QPVNALLKEA 240 

EKRNLSLLQA RLSQDIiAREQ IRQAQDGHLP TLDLTASSGI SDTSYSGSKT RGAAGTQYDD 3 00 

SNMGQNKVGL SFSLPIYQGG MVNSQVKQAQ YNFVGASEQL ESAHRSWQT VRSSFNNINA 3 60 

SISSINAYKQ AWSAQSSLD AMEAGYSVGT RTIVDVLDAT TTLYNAKQEL ANARYNYLIN 420 

QLNIKSALGT LNEQDLLALN NALSKPtfSTN PENVAPQTPE QN&IADGYAP DSE&PVYQQT 48 0 

15 SARTTTSNGH NPFRN 495 



<212> Type : PRT 
<211> Length : 495 

SequenceNarae : SEQ ID 69 

SequenceDescription : 

20 

Sequence 

<213> OrganismName : Escherichia coli 0157:H7 
<4 00> PreSequenceString : 

25 MTKLKLLALG VL I ATSAGVA HAEGKFSLGA GVGWEHPYK DYDTDVYPVP VINYEGDNFW 60 
FRGLGGGYYL WNDATDKLSI TAYWSPLYFK AKDSGDHQMR HLDDRKS TMM AGLSYAHFTQ 120 
YGYLRTTLAG DTLDNSNGIV WDMAWLYRYT NGGLTVTPGI GVQWNSENQN EYYYGVSRKE 180 
SARSGLRGYN SNDSWSPYLE LSASYNFLGD WSVYGTARYT RLSDEVTDSP IVDKSWTGLI 240 
STGITYKF 248 

30 <212> Type : PRT 

<211> Length : 248 

SequenceNarae : SEQ ID 70 
SequenceDescription : 

35 Sequence 



<213> OrganismNarae : Escherichia coli 0157:H7 
<400> PreSequenceString : 

MKKTLLAAGA VLALSSSFTV NAAENDKPQY LSDWWHQSVN WGSYHTRFG PQIRNDTYLE 60 

40 YEAFAKKDWF DFYGYADAPV FFGGNSDAKG IWNHGSPLFM EIEPRFSIDK LTNTDLSFGP 12 0 

FKEWY FANNY IYDMGRNKDG RQSTWYMGLG TDIDTGLPMS LSMNVYAKYQ WQNYGAANEN 180 

EWDGYRFKIK YFVP I TDLWG GQLSYIGFTN FDWGSDLGDD SGNAINGIKT RTNNSIASSH 240 

ILALNYDHWH YSWARYWHD GGQWNDDAEL NFGNGNFNVR STGWGGYLW GYNF 2 94 



45 <212> Type : PRT 

<211> Length : 294 

SequenceNarae : SEQ ID 71 
SequenceDescription : 

50 Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

MLSTQFNRDN QYQAITKPSL LAGCIALALL PSAAFAAPAT EETVXVEGSA TAPDDGENDY 60 

55 SVTSTSAGTK MQMTQRDIPQ SVTIVSQQRM EDQQLQTLGE VMENTLGISK SQADSDRALY 12 0 

YSRGFQIDNY MVDGIPTYFE SRWNLGDALS DMALFERVEV VRGATGLMTG TGNPSAAINM 180 

VRKHAT S RE F KGDVSAEYGS WNKERYVADL QSPLTEDGKI RARIVGGYQN NDSWLDRYNS 240 

EKTFFSGIVD ADLGDLTTLS AGYEYQRIDV NSPTWGGLPR WNTDGSSNSY DRARS TAPDW 3 00 

AYNDKE INKV FMTLKQRFAD TWQATLNATH SEVEFDSKMM YVDAYVNKAD GMLVGPYSNY 360 

60 GPGFDYVGGT GWNSGKRKVD ALDLFADGSY ELFGRQHNLM FGGSYSKQNN RYFSSWANIF 420 

PDEIGSFYNF NGNFPQTDWS PQSLAQDDTT HMKSLYAATR VTLADPLHLI LGARYTNWRV 4 80 

DTLTYSMEKN HTTPYAGLVF DINDNWSTYA SYTSIFQPQN DRDSSGKYLA PITGNNYELG 540 

LKSDWMNSRL TTTLAIFRIE QDNVAQSTGT PIPGSNGETA YKAVDGTVSK GVEF ELNGAI 600 

TDNWQLTFGA TRYIAEDNEG NAVNPNLPRT TVKMFTSYRL PVMPELTVGG GVNWQNRVYT 660 

65 DTVTPYGTFR AEQGS YALVD LFTRYQVTKN FSLQGNVNNL FDKTYDTNVE GSIVYGAPRN 72 0 

FSITGTYQF 729 
<212> Type : PRT 
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<211> Length : 729 

SequenceName : SEQ ID 72 
SequenceDescription : 

5 Sequence 

<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

MARFQFKNRK NNGLIFFISF MVMGEAAIAA PLPQWANAPA VTPVAQLSLQ ESILRAFARN 60 
10 PGVTQQAAQI GIGEAQIDEA KSAWYPHVGL TGNAGPSRQT DSSGRLDNW SYGITLTQLV 120 
YDFGKTNNDI NLQTAARDSY RFKLMATLTD VAEKTATAYM EVSRYQALCD AAQRNIHSLE 180 
NVYNMAALRA NAGLNSSSDE LQAQTRIAGM RSTLEQYQAQ MASAKAQLAV LTGVQPEAIA 240 
APPAELAEQP VSLKNIDYQS IPLVLAAENL RQSAQYGVEK TKAQYWPTLS IQGGKTRYQT 3 00 

SDRSYWDDQL QLMVNAPLYQ GGAVSAQVQQ AEGQQKISAS QVEQAKLDVL QRASV&YANW 3 60 

15 TGARGREEAG LAQSESAHKT RDVYQNEYKL GKRSLNDLLT VEQDVFQAQS AEIMANYDGW 42 0 

VAAVNYAAAV NNLIPLAGIK QGLYNDLPDL K 451 
<212> Type : PRT 
<211> Length : 451 

SequenceName : SEQ ID 73 
20 SequenceDescription : 

Sequence 



45 



<213> OrganismName : Escherichia coli Q157:H7 

25 <400> PreSequenceString : 

MAKFTPSFSG IKGRALFSLL FAAPMIHATD TATTKDGETI TVTADANTAT EATDGYQPLS 60 

TSTATLTDMP MLDIPQWNT VSDQVLENQN ATTLDEALYN VSNWQTNTL GGTQDAFVRR 120 

GFGANRDGSI MTNGLRTVLP RSFNAATERV EVLKGPASTL YGILDPGGLX NWTKRPEKT 18 0 

FHGSVSATSS SFGGGTGQLD ITGPIEGTQL AYRLTGEVQD EDYWRNFGKE RSTFIAPSLT 240 

30 WFGDNATVTM LYSHRDYKTP FDRGTI FDLT TKQPVNVDRK IRFDEPFNIT DGQSDLAQLN 3 00 

AEYHLNSQWT ARFDYSYSQD KYSDNQARVT AYDATTGTLT RRVDATQGST QRMHS TRADL 3 60 

QGNVDIAGFY NEILGGVSYE YYDLLRTDMI RCKNAKDFNI YNPVYGNTSK CTTVSASDSD 420 

QTIKQESYSA YAQDALYLTD NWIAVAGIRY QYYTQYAGKG RPFNVNTDSR DEQWTPKLGL 4 80 

VYKLTPSVSL FANYSQTFMP QSSIASYIGD LPPESSNAYE VGAKFELFDG I TAD I ALFD I 540 

35 HKRNVLYTES IGDETIAKTA GRVRSRGVEV DLAGALTENI NIIASYGYTD AKVLEDPDYA 600 

GKPLPNVPRH TGSLFLT YD I HNMPGNNTLT FGGGGHCVSR RSATNGADYY LPGYFVADAF 660 

AAYKMKLQYP VTLQLNVKNL FDKTYYTSSI ATNNLGNQIG DPREVQFTVK MEF 713 

<212> Type : PRT 
40 <211> Length : 713 

SequenceName : SEQ ID 74 
SequenceDescription : 



Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

MRTLQGWLLP VFMLPMAVYA QEATVKEVHD APAVRGSIIA NMLQEHDNPF TLYPYDTNYL 60 
IYTQTSDLNK EAIASYDWAE NARKDEVKFQ LSLAFPLWRG ILGPNSVLGA. SYTQKSWWQL 12 0 

50 SNSEESSPFR ETNYEPQLFL GFATDYRFAG WTLRDVEMGY NHDSNGRSDP TSRSWNRLYT 180 
RLMAENGNWL VEVKPWYWG NTDDNPDITK YMGYYQLKIG YHL GDAVL SA KGQYNWNTGY 240 
GGAELGLSYP ITKHVRLYTQ VYSGYGESLI DYNFNQTRVG VGVMLNDLF 289 
<212> Type : PRT 
<211> Length : 289 

55 SequenceName : SEQ ID 75 

SequenceDescription : 

Sequence 



60 <213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MAVQKNVIKG I LAGTFALML SGCVTVPDAI KGSSPTPQQD LVRVMSAPQL YVGQEARFGG 60 

KWAVQNQQG KTRLEIATVP LDSGARPTLG EPSRGRIYAD VNGFLDPVDF RGQLVTWGP 12 0 

ITGAVDGKIG NTPYKFMVMQ ATGYKRWHLT QQVIMPPQPI DPWFYGGRGW PYGHGGWGWY 18 0 

65 NPGPARVQTV VTE 3.93 
<212> Type : PRT 
<211> Length : 193 
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SeguenceName : SEQ ID 76 
SequenceDe script ion : 



Sequence 
5 

<213> OrganismName : Escherichia coli 0157 :H7 
<4 00> PreSequenceString : 

MRKQWLGICI AAGMLAACT S DDGQQQTVSV PQPAVCNGPI VEISGADPRF EPLNATANQD 60 
YQRDGKSYKI VQDPSRFIQA GLAAIYDAEP GSNLTAS GEA FDPTQLTAAH PTLPIPSYAR 120 

10 ITNLANGRMI WRINDRGPY GNDRVISLSR AAADRLNTSN NTKVRIDPII VAQDGSLSGP 180 
GMACTTVAKQ TYALPAPPDL SGGAGTSSVS GPQGDILPVS NSTLKSEDPT GAPVTSSGFL 240 
GAPTTLAPGV LEGSEPTPAP QPVVTAPSTT PATSPAMVTP QAASQSASGN FMVQVGAVSD 3 00 

QARAQQYQQQ LGQKFGVPGR VTQNGAVWR I QLGPFANKAE ASTLQQRLQT EAQLQSFITT 3 60 

AQ - •■ 362 

15 <212> Type : PRT 

<211> Length : 3 62 

SequenceName : SEQ ID 77 
SequenceDescription : 

20 Sequence 

<213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MIKRVLWSM VGLSLVGCVN NDTLSGDVYT AS EAKQVQNV SYGTIVNVRP VQIQGGDDSN 60 
25 VIGAIGGAVL GGFLGNTVGG GTGRSLATAA GAVAGGVAGQ GVQSAMNKTQ GVELE I RKDD 120 

GNTIMWQKQ GNTRFSPGQR WLASNGSQV TVSPR 155 

<212> Type : PRT 

<211> Length : 155 

SequenceName : SEQ ID 78 
30 SequenceDescription : 



Sequence 



<213> OrganismName : Escherichia 
35 <400> PreSequenceString : 

MSKATEQNDK LKRAIIISAV LHVILFAALI 
VEQYKRMQSQ ESSAKRSDEQ RKMKEQQAAE 
EEAAKQAELK QKQAEEAAAK AAADAKAKAE 
EAQKKAEAAA AALKKKAEAA EAAAAEARKK 
40 KAAADKKAAA AKAAAEKAAA AKAAAEADDI 
KNNGAS GAD I NNYAGQIKSA IESKFYDASS 
CQAALAAAKL AKIPKPPSQA VYEVFKNAPL 
<212> Type : PRT 
<211> Length : 394 
45 SequenceName : SEQ ID 79 

SequenceDescription : 



COli 0157:H7 

WSSFDENIEA SAGGGGGS S I DAVMVDSGAV 60 

ELREKQAAEQ ERLKQLEKER LAAQEQKKQA 120 

ADDKAAEEAA KKAAADAKECK AEAEAAKAAA 180 

AAAEKAAADK KAAEKAAAEK AAADKKAAAE 240 

FGELSSGKNA PKTGGGAKGN NASPAGSGNT 3 00 

YAGKTCTLRI KLAPDGMLLD IKPEGGDPAL 3 60 

DFKP 3 94 



Sequence 



50 <213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MMKFKKCLLP VAMLASFTLA GCQSNADDHA ADVYQTDQLN TKQETKTVNI ISILPAKVAV 60 
DNSQNKRNAQ AFGAL I GAVA GGVI GHNVGS GSNSGTTAGA VGGGAVGAAA GSMVNDKTLV 120 
EGVSLTYKEG TKVYTSTQVG KECQFTTGLA WITTTYNET RIQPNTKCPE KS 172 

55 

<212> Type : PRT 
<211> Length : 172 

SequenceName : SEQ ID 80 

SequenceDescription : 

60 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<4 00> PreSequenceString : 

65 MLLSIITVAF RNLEGIVKTH ASLAHLAQAE DISFEWIWD GGSNDGTREY LENLNGIYNL 6 0 

RFVSEPDNGI YDAMNKGIAM AQGKFAL FLN SGDIFHQDAA YFVRKLKMQK DNVMI TGDAL 12 0 

LDFGDGHKIK RSAKPGWYIY HSLPASHQAI FFPVSGLKKW RYDLEYKVSS DYALAAKMYK 180 
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AGYAFKKLNG LVSEFSMGGV STTNNMELCA DAKKVQRQIL HVPGFWAELS WHLRQRTTSK 240 
TKALYNKS 248 
<212> Type : PRT 
<211> Length : 248 

SequenceName : SEQ ID 81 

SequenceDescription : 

Sequence 



10 <213> OrganismName : Haemophilus influenzae Rd 
<400> PreSequenceString : 

MKLTTLQTLK KGFTLIELMI VIAIIAILAT IAIPSYQNYT KKAAVSELLQ ASAPYKADVE 60 
LCVYSTNETT SCTGGKNGIA AD I KTAKGYV ASVITQSGGI TVKGNGTLAN ME Y I LQAKGtT 120 
AAAGVTW TTT CKGTDASLFP ANFCGSVTK . - . 149 

15 <212> Type : PRT 

<211> Length : 149 

SequenceName : SEQ ID 82 

SequenceDescription : 

20 Sequence 



<213> OrganismName : Haemophilus influenzae .Rd 
<40 0> PreSequenceString : 

MLNKKFKLNF IALTVAYALT PYTEAALVRD DVDYQI FRDF AENKGRF SVG ATNVEVRDKU 60 

25 NHSLGNVLPN GIPMIDFSW DVDKRIATLI NPQYWGVKH VSNGVSELHF GNLNGNMNNG- 12 0 

NAKSHRDVSS EENRYFSVEK NEYPTKLNGK AVTTEDQTQK RREDYYMPRL DKFVTEVAPX 180 

EASTASSDAG TYNDQNKYPA FVRLGSGSQF I YKKGDNYS L ILNNHEVGGN NLKLVGDAYT 240 
YGIAGTPYKV NHENNGLIGF GNSKEEHSDP KGILSQDPLT NYAVLGDSGS PLFVYDREKG- 3 00 

KWLFLGSYDF WAGYNKKSWQ EWNIYKPEFA KTVLDKDTAG SLTGSNTQYN WNPTGKTSVX 3 60 

30 SNGSESLNVD LFDSSQDTDS KKNNHGKSVT LRGSGTLTLN NNIDQGAGGL FFEGDYEVKG 420 

TSDSTTWKGA GVSVADGKTV TWKVHNPKSD RLAKI GKGTL I VEGKGENKG SLKVGDGTVX 48 0 
LKQQADANNK VKAFSQVGIV SGRSTWLND DKQVDPNSIY FGF RGGRLDA NGNNLTFEHX 540 
RNIDDGARLV NHNTSKTSTV- TITGESLITD PNTITPYNID APDEDNPYAF RRI KDGGQL Y" 60 0 

LMLENYTYYA LRKGASTRSE LPKNSGESNE NWLYMGKTSD EAKRNVMNHI NNERMNGFNG 660 

35 YFGEE EGKNN GNLNVTFKGK SEQNRFLLTG GTNLNGDLKV EKGTLFLSGR PTPHARDIAG 720 
ISSTKKDQHF AENNEWVED DWINRNFKAT NINVTNNATL YSGRNVANIT SNITASDNAEC 780 
VHIGYKAGDT VCVRSDYTGY VTCTTDKLSD KALNS FNATN VSGNVNLSGN ANFVLGKANIL. 840 
FGTISGTGNS QVRLTENSHW HLTGDSNVNQ LNLDKGHIHL NAQNDANKVT TYNTLTWSLa 9 00 

SGNGSFYYLT DLSNKQGDKV WTKSATGNF TLQVADKTGE PTKNELTLFD ASNATRNNLKT 960 

40 VSLVGNTVDL GAWKYKLRNV NGRYDLYNPE VEKRNQTVDT TNITTPNMIQ ADVPSVPSNKT 1020 

EE I ARVETPV PPPAPATPSE TTETVAENSK QESKTVEKNE QDATETTAQN GEVAEEAKPS 1080 

VKANTQTNEV AQSGSETEET QTTE I KETAK VEKEEKAKVE KDEIQEAPQM ASETSPKQABC 1140 

PAPKEVSTDT KVEETQVQAQ PQTQSTTVAA AEATSPNSKP AEETQPSEKT NAEPVTPWS 12 00 

KNQTENTTDQ PTEREKTAKV ETEKTQEPPQ VASQASPKQE QSETVQPQAV LES ENVPTVNT 1260 

45 NAEEVQAQLQ TQTSATVSTK QPAPENSINT GSATAITETA EKSDKPQTET AASTEDASQHE 13 20 

KANTVADNSV ANNSESSDPK SRRRRSISQP QETSAEETTA ASTDETTIAD NSKRSKPNRR 13 80 

SRRSVRSEPT VTNGSDRSTV ALRDLTSTNT NAVISDAMAK AQFVALNVGK AVSQHISQLE 1440 

MNNEGQYNW VSNTSMNENY SSSQYRRFSS KSTQTQLGWD QTISNNVQLG GVFTYVRNSIST 1500 

NFDKASSKNT LAQVNFYSKY YADNHWYLGI DLGYGKFQSN LKTNHNAKFA RHTAQFGLTA 1560 

50 GKAFNLGNFG ITPIVGVRYS YLSNANFALA KDRIKVNPIS VKTAFAQVDL SYTYHLGEFS 1620 

VTPILSARYD TNQGSGKINV NQYDFAYNVE NQQQYNAGLK LKYHNVKLSL IGGLTKAKQA 168 0 

EKQKTAELKL SFSF 1694 
<212> Type : PRT 
<211> Length : 1694 

55 SequenceName : SEQ ID 83 

SequenceDescription : 

Sequence 



60 <213> OrganismName : Haemophilus influenzae Rd 
<400> PreSequenceString : 

MALVNKI KTL SSVGILAATL FLAGCQAQSN ILAFTPPAPS AS MNVNRTAV VSVTTKDSRA 60 
IQEIASYTKH GELIKLNASP SVTQLFQQVM QQNLISKGFR VGQLNGSNAW VTVDVREFGT 120 
QVEQGNLRYK LNTKIQATVY VQGAKGS YNK SFNVTHSQEG VFNAGNDEIH KVLSQTFNDI 18 0 

65 VNNIYQDQEV AAAINQYSN 19 9 

<212> Type : PRT 
<211> Length : 199 
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SequenceName : SEQ ID 84 
SequenceDe script ion : 



Sequence 

<213> OrganismName : Haemophilus influenzae Rd 
<4 00> PreSequenceString : 

MLCWIGYKNG ILPQQNSTLY PWLNPSKCGV IFDGFQLVGD DFNSDQTAEN TSPAWQVLYT 60 
THLQSCSPIH SGENFAPIPL YKQLKNQPHL SQDLIKWQEN WQACDQLQMN GAVLEQQSLA 120 
EISDHQSTLS KHGRYLiAQE I EKETGIPTYY YLYRVGGQSL ESEKSRCCPS CGAJSfWALKDA 18 0 

IFDTFHFKCD TCRLVSNLSW NFL 2 03 

<212> Type : PRT 
<211> Length : 203 

SequenceName : SEQ ID 85 

SequenceDescription : 



Sequence 



<213> OrganismName : Haemophilus influenzae Rd 
<4 00> PreSequenceString : 

MGAFAFASVT NAN I YAEGD I GLSQTKANGS NNTRVGPRVS VGYKVGNTRV AGDYTHHGKV 60 
DGTKIQGLGA SVLYDFDTNS KVQPYVGARV ATNQFKYTNR AEQKFKSSSD IKLGYGWAG 12 0 

AKYKLDGNWY ANGGVEYNRL GNFDSTKVNN YGAKVGVGYG F 161 
<212> Type : PRT 
<211> Length : 161 

SequenceName : SEQ ID 86 

SequenceDescription : 



Sequence 



<213> OrganismName : Haemophilus influenzae Rd 
<400> PreSequenceString i 

MKKLLI AS LL FGTTTTVFAA PFVAKDIRVD GVQGDLEQQI RAS LPVRAGQ RVTDNDVANI 60 

VRSLFVSGRF DDVPCAHQEGD VLWSWAKS IISDVKIKGN SIIPTEALKQ NLDANGFKVG 120 

DVLIREKLNE FAKSVKEHYA SVGRYNATVE PIVNTLPNNR AEILIQINED DKAKLASLTF 180 

KGNESVSSST LQEQMELQPD SWWiCLWGNKF EGAQFEKDLQ SIRDYYLNNG YAKAQITKTD 240 

VQLNDEKTKV NVTIDVNEGL QYDLRSARII GNLGGMSAEL EPLLSALHLN DTFRRSDIAD 3 00 

VENAIKAKLG ERGYGSATVN SVPDFDDANK TLAITLWDA GRRLTVRQLR FEG3STTV SAD S 3 60 

TLRQEMRQQE GTWYNSQLVE LGKIRLDRTG FFETVENRID PINGSNDEVD WYKVKERNT 420 

GSINFGIGYG TESGISYQAS VKQDNFLGTG AAVS I AGTKN DYGTSVNLGY TEPYFTKDGV 48 0 

SLGGNVFFEN YDNSKSDTSS NYKRTTYGSN VTLGF PVNEN NSYYVGLGHT YNKISNFALE 540 

YNRNLYIQSM KFKGNGIKTN DFDFSFGWNY NSLNRGYFPT KGVKASLGGR VTIPGSDNKY 60 0 

YKLSADVQGF YPLDRDHLWV VSAKASAGYA NGFGNKRLPF YQTYTAGGIG SLRGFAYGSI 660 

GPNAI YAEHG NGNGTFKKIS SDVIGGNAIT TASAELIVPT PFVSDKSQNT VRTSLFVDAA 72 0 

SVWNTKWKSD KSGLDNNVLK SLPDYGKSSR IRASTGVGFQ WQSPIGPLVF SYAKPIKKYE 780 

NDDVEQFQFS IGGSF 795 
<212> Type : PRT 
<211> Length : 795 

SequenceName : SEQ ID 87 

SequenceDescription : 



Sequence 



<213> OrganismName : Haemophilus influenzae Rd 
<40 0> PreSequenceString : 

MLKKTSLIFT ALLMTGCVQN ANVTTPQAQK MQVEKVDKAL QKGEADRYLC QDDRWRWH 60 
ATHKKYKKNL HYVTVTFQGV SEKLTLMISE RGKNYANIRW MWQERDDFST LKT3STLGEILA 120 
TQCVSQTSER LSGQ 134 
<212> Type : PRT 
<211> Length : 134 

SequenceName : SEQ ID 88 

SequenceDescription : 



Sequence 



<213> OrganismName : Haemophilus influenzae Rd 
<40 0> PreSequenceString : 
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MRIIIIFFMG LNMTNFRLER ACLFRYAWAN GRCCLCSSTN QPTNQPTNQP TNQPTNQPTN 6 0 

QPTNQNSNVS EQLEQINVSG STENSDTKTP PKIAETVKTA KTLEREQANN IKDIVKYETG 120 

VTWEAGRFG QSGFAIRGVD ENRVAINIDG LRQAETLSSQ GFKELFEGYG NFNNTRNGAE 18 0 

IETLKEVNIT KGADSIKNGS GSLGGSVIYK TKDARDYLIN KDYYVSYKKG YATENNQSFD 240 

5 TLTLAGRYKK FDVLWT TS R NGHELENYGY KNYNDKIQGK KREKADPYKI EQDSTLLKLS 3 00 

FNPTENHRFT FAADLYEHRS RGQDLSYTLK YQRSGMETPE VDSRHTMDKT KRRNISFSYE 3 60 

NFSQTPFWDT LKLTYSDQRI KTRARTDEYC DAGVRHCEGT DNPTGLKVTN GKI TRRDGSD 42 0 

LQFEEKNNTA KSSDKTYDFK KFIDTDKRVI DDKLVLNNPS DTWYDCSIFN CENNAKIKVF 480 

KGNNYYGYDG KWKEVDLEIK ELNGKKFAKI KDNDRKI KS I LPSSPGYLER LWQERDLDTN 540 

10 TQQLNLDLTK DFKIWHIEHN LQYGGSYNTA MKRMVNRAGN DASDVQWWAT PTLGEDSWTG 600 

KPHTCATTYE WNANLCPRVD PEFSYLLPIK TTGKSVYLFD NFVITDYLSF DLGYRYDNIH 6 GO 

YQPKYKHGIT PKLPDDIVKG LFIPLPNNSN SDPNKVKENV QQNIDYIAKQ NKKYKAHSYS 720 

FVSTIDPTSF LRLQLKYSKG FRTPTSDEMY FTFKHPDFTI LPNTDLKPEI AKTKEIAFTL 7 8 0 

HNDDWGFIST SLFKTNYKNF JDLIFKKQET FSYGGSGRGE TLPFSLYQNI NRDNAS3VPCGI P4Q 

15 EINSKVFLGK MAKFMDGFNL SYKYTYQKGR MNGNIPMNAI QPRTMVYGLG YDHPNHKFGF 300 

DFYTTKVASK NPEDTYNMFY KEENKKDSTI KWRSKSYTIL DLIGYVQPIK NLTIRAGVYN 960 

LTNRKYITWD SARSIRSFGT SNVIDQSTGL GINRFYAPGR NYKMSVQFEF 1O10 
<212> Type : PRT 
<211> Length : 1010 

20 SequenceName : SEQ ID 89 

SequenceDescription : 

Sequence 



25 <213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

MTYRNGKIDL KERFS KNRS F KGIKKKIAKK YTIKNSLSII YSLKTHSNSS LSINKKIFLG 60 

LGFVSAIiSAQ SEDYNSSVYW LNSVNENNNN KSYYISPLRT WAGGNRSFTQ NYNNSQLYIG 12 0 

TKNASATPNH SSVWFGEKGY IGFITGVFKA RD I F I TGAVG SGNELKTGGG AILVFESSNE 180 

30 LTTNGAYFQN NRAGTQTSWI NLISNNSVNL TNTDFGNQTP NGGFNVMGRK I T YNGGSVNG 240 

GNFGFDNVDS NGATTISGVT FNNNGALTYK GGNGIGGSIT FTNSNINHYK LNLNANSVTF 3 00 

NNSTLGSMPN GNANTIGNAY ILNANNITFN NLTFNGGWFV FNRSDAHVNF QGTTTINNPT 3 60 

SPFVNMTGKV TINPNAIFNI QNYTPTIGNA YTLFSMKNGN IAYDDVNNLW Nil RLKNTQA 42 0 

TKDNSKNATS NNNTHTYYVT YNLGGTLYHF RQIFSPDSIV LQSVYYGANN LYYTNSVNIH 48 0 

35 DNVFNLKNIN DDRADTIFYL NGLNTWNYTQ ARFAQTYGGK NSALVFNATT PWANGAIPKS 540 

NSTVRFGGYE GVNWGKTGYI TGTFTADRVY I TGNMMS GNG AQTGGGATLN FVGATE INI A €0 0 

GATFKNLKTT SQNSYMTFMA LGNGSGSGKI NVSQSDFYDW TDGGYDFTGN GVFDSVNFNK 660 

AYYKFQGAEN SYNFKNTNFL AGNFKFQGKT TIEKSVLNDA SYAFDGVNNA FNEDKFNGGS V2 0 

FNFNHAEQTN AFNNNSFSGG SFSFNAKQVD FNGNS FNGGV FNFNNTPKAS FTNDTFNVNN 780 

40 QFKINGAQTD FTFS KGWFN MQGLLSSLSV GTTYQLLNAK SVGYKDNNNA LYQMLRWTSG 840 

ENPSGKLVDE NKTAPNSAKI YNVQFTDNGL TYYIKENFNN GITLTRLCTL GYTHCVNIDN 900 

DAFNLKNVNN NASNTVFYLN GMTTWKTAGT GVFTQDYSGT NSVLVFNQTT PFLAGANPTS 960 

NSWGFGKTS GAEWGLVGYI QGVFKANQID ITGTIRSGNG AKTGGGATLV FNAQERLNIA 102 0 

NANLNNDKAG LQNSWMNFIV NNGNLNVTNA NFSNQTPHGG FNLKANNITW DKGSVSGGGN 108 0 

45 FGVDNANANG NAVIKNVNFS DNGTLIYKGG ENSAGNSLTL ENNTFNSYNI NAKAQNLI FN" 1140 

NNSFNSGSYS FNDTKNVTFK GTNTLINSDP FSRLKGSVSI DNNSIFNIER DLTDKTTYTL 120 0 

LSGDNIKYNN QALADNVFSK NLWDLIHYDG EQGTLLRTDN NTYFVQFTQS NGQKFVFEET 12 60 

FNPGSITYKY FTIHSSPFHT EADSKDIWNQ VRKQFDFIPG KTPVCVGVCY IAPYKNQDLI 1320 

GS S AFAWSLN FGATWGTLL LGSAQEKANN NGGS I WFGKN NLLYLHGNFN ATNIFLTNNF 1380 

50 NVGNPNAGGG ATINFNADET LSADGLNYTN FQTVAMGLQT SASQHSWANF NSKLSMEIKN 1440 

SNFRDFTWGG FRFNSGRITF ENTTFSGWTN INGATESGSS YVNMVANTDL IFTDSILGGG 1500 

IRYDLKANNI I FNNTQMWD VSKNVNQSSL NGNVTFNHSR LSVKPNAAIN IGGDQTQTTL 1560 

ENASSLSFYN DSVAMFNGTT AFNGVS YLNL NPNAQVSFNQ ANFNNANVTF YGI PLFGKTP 162 0 

NFGNSVRLIN FKGDAKFNQA TLNLRAKNIH LNFQGASTFE NNSTMNLAES SQASFNALSV 168 0 

55 EGETNFNLNG SSLLS FNGNS VFNAPVNFYA NNSQISFTHS ATFNADASFD LGNNSTLNFQ 1740 

SVLLNSALNL LGNGGNNLAI NAKGNFSFGS QGILNLSYMN LFGGDKKASV YDVLQAQNID 180 0 

GLRGNNGYEK IRFYGIQIEK ADYSFNNGVH SWSFTNPLNT TETITETLHN NRLKVQISQN 1860 

GASNNAMFNL APSLYDYQQN PYDESENSYN HTSDKAGTYY LSSSIKGFGK NNEIPGTYNA 192 0 

QNQPLQALHI YNQAISKQDL NMIASLGKEF LPKVAKLIAS GALDNLNLNS PDSFETIFSI 1980 

60 LKEYGITLNQ ANWKSLLKII NNFSNTANYH FSQGSIjWGA IKEGQTNTNS WWFGGDGYK 2 04 0 

NPCAVGDNTC QMFRQTNLGQ IiLNSSVPYLG YINANFKAKN IYITGTIGSG NAWGS GGS AN 210 0 

VSFESATNLV LNQANIDAQG TDKIFSYLGK EGIDKLFGEK GLGNVLSNIV YEESLNDNAI 2160 

PKDLANMIPK DLGSKTLSSL LSPTEVNNLL GVSAFKNAIM EILNSKTVGD VFGENGLLNA 2 22 0 

LDPVKRKEID QMLLEQIQAH SSGFEKFIVK TLGIENVENF INNWYGKQSL SSFANNFVPG 2 28 0 

65 GLNQALDKIG SSSDAKDLQS FLDKTTFGDI LNQMINQAPL INKLISWLGP QDLSVLVNIA 2 3 40 

LNS I TNPSKE LLGAISGMGQ KVLNDLLGEG WNKIMSNQV LGQMINKI I A DKGFGGVYHQ 2400 

GLGSILPKSL QDELKKLGMG SLLKPKGLHN LWQKGNFNFV AKNHVFVNNS LFSNATGGEL 2 460 
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NFVAGKSIIF NGKNTINFTQ YQGRLSFVSK DFSNISLDTL NATNGLTLNA SKNDISVQKG 252 0 

QICVNVLDCM TAKGKTTQTN SSSSATAPTN ETLEVSANNF AFLGTIKANG LVDFSKVLQN 258 0 

TTIGTLDLGP NATFKANNLI VNNAFNNNSN YRANISGNFN VAKGATFSTN ENGLNVGGNF 264 0 

NSEGPLIFNL NNPTHQTIIN VTGTSTIMSY NNQALINFNT QLKQGAYTLI NANRMVYGYD 27 0 0 

NQTILGGSLS DYLKLYTLID FNGKRMQLNG DSLSYDNQPV SIKDGGIiWS FKDNQGQMVY 2760 

SSILYDKIQV TVSDKPMSIQ APSLEYYVKR IQGSAGLNAI KSAGNNSIMW LSELFAAKGG 28 2 0 

NPLFAPYYLQ DNPTEHIVTL MKDITSALGM LSNSNLKNNS TDVLQLNTYT QQMSRLAKLS 2 88 0 

NFAS FDSTDF SERLSSLKNQ RFADAVPNAM DVILKYSQRD KLKNNLWATG VGGVS FVENG 294 0 

TGTLYGVNVG YDRFVRGVIV GGYAAYGYSG FYERITSSKS DNVDVGLYAR AFIKKSELTF 30 0 0 

SVNETWGANK TQISSNDALL SMINQSYKYS TWTTNAKVNY GYDFMFKNKS IILKPQIGLR 3 06 0 

YYYIGMSGLE GVMNOTLYNQ FKANADPSKK SVLTIDFALE NRHYFNTNSY FYAIGGVGRD 312 0 

LLVNSMGDKL VRFIGNNTLS YRKGDLYNTF ANITTGGEVR LFKSFYANAG VGARFGLDYK 318 0 

MIDIIGNIGM RLAF 3134 
<212> Type : PRT 
<211> Length : 3194 

SequenceName : SEQ ID 90 

SequenceDescription : 

Sequence 



<213> Organi smName : Helicobacter pylori J99 
<400> PreSequenceString : 

MKQF KKKPKK IKRSHQNQKT ILKRPLWLMP LIiIGGFASGV YADGTDILGL SWGEKSQKVC 60 

VHRPWYAIWS CDKWEEKTQQ FTGNQL1TKT WAGGNAANYY HSQNNQDITA NLKNDNGTYF 12 0 

LSGLYNYTGG EYNGGNLDIE LGSNATFNLG ASSGNSFTSW YPNGHTDVTF SAGTINVNNS 180 

VEVGNRVGSG AGTHTGTATL NLNANKVTIN SNISAYKTSQ VNVGNANSVI TINSVSLNGD 2 40 

TCSSLARVGV GANCSTSGPS YSFKGTTNAT NTTFSNSSGS FTFEENATFS GAKLNGGAFT 3 O 0 

FNKKFNATNN TAFNSGS FTF KGTSSFNGAN FSNASYTFNN QATFQNSSFN GGTFTFNDQT 3 60 

NQSTQHPQIQ NSSFSGSATT LKGFATFEQA FNNSNHQLTI QNASFNNATF NNTGKITIEK 4 20 

DASFNNTSFN TPVDTNNMTI SGGVTLSGKN DLKNGATLDF GSSKITLTQG TTFNLTSLGS 4 80 

EKSVTILNSR GGITYNHLLN HAINSLTNAL KTNESSSKPQ SFAQGLWDMI TYNGVTGQLL 5 40 

NENAATS KPT DSSPSKSSTN STQVYQVGYK IGDTIYKLQE TFSHNSIIIQ ALESGTYTPP 600 

PVINGSKFDL SASNYINADM PWYNHKYYIP KSQNFTESGT YYLPSVQIWG SYTNSFKQTF 6 60 

SASNSNLVIG YNATWTDHNV SSSDTVAFGD TSGSALNGHC GPWPYYQCTG TTNGTYSAYH 72 0 

VYITANLRSG NRIGTGGAAN LIFNGVDSIN IANATITQHN AGAYSSSMTF STQNMDNSQN 7 80 

LNGLNSNGKL LVYGTTFTNQ AKDGKFIFNA GQATFENTNF NGGSYQFSGD SLNFSNNNQF 8 40 

NSGSFEIGAK NT I FNNANFN NSTSFNFNNS SATTSFVGDF TNANSNLQ I A GNAVFGNSTN 9O0 

GSQNTANFNN TGSVNIAGNA TFDNWFNSP TNTSVKGKVT LNNITLKNLN APLSFGDGTI 9 60 

VFSAHSVINI GEAITNGNPI TLVSSSKAIE YNDAFSKNLW QIiINYQGHGA SSEKLVSSAG 10 20 

NGVYDWYSF NNQTYNFQEV FSPNSISIRR LGVGMVFDYV DMEKSDRLYY QNALGFMTYM 10 8 0 

PNSYNNNLGN LNNTIYYYDN SIDFYASGKT LFTKAEFSQT FTGQNSAIVF GAKNIWTSVS 10.4 0 

DAPQSNVIIR FGDNKGAGSN DASGHCWNLQ CIGFITGHYE AQKIYITGSI ESGNRISSGG 12 00 

GASLNFNGLQ GILLTNATLY NRAAGTQSSS MNFVSNSANI QAQNSYFIDD TAQNKGNPNF 12 60 

SFNALNLDFS NSSFRGYVGQ TQSVFKFNAV NAISFTNSSN LSSGLYQMQA KSVLFDNSNL 13 2 0 

SVSVGTSSIK ANAINLSQNA SINASNHSTL ELQGDLNLND TSSLNLNQSA INVSNNATIN 13 80 

DYASLIASNG SHLNFNGAVN FNSANITTSL SSSSIVFKGA VSLRGQFNLS NNSSLDFQGS 14 4 0 

SAITSNTAFN FYDNAFSQSP I T FHQALD I K VPLSLGGNLL NPNNSSVLNL KNSQLVFSDQ 15 00 

GSLNIANIDL LSDLNGNKNR VYNIIQADMN GNWYERINFF GMRINDGIYD AKNQTYSFTN 15 60 

PLNNALKITE SFKNNQLSVT LSQIPGIKNT LYNIGSEIFN YQKVYNNANG VYSYSDDAQG 16 20 

VFYLTSSVKG YYNPNQSYQA SGSNNTTKNN NLTSESSVIS QTYNAQGNPI SALHVYNKGY 16 80 

NFSNIKALGQ MALKLYPEIK KILGNDFSLS SLSNLKGDAL NQLTKLITPS DWKNINELID 17 40 

NANNSWQNF NNGTL 1 1 GAT KIGQTDTNSA WFGGLGYQK PCDYTDIVCQ KFRGTYLGQL 18 0 0 

LESISADLGY IDTTFNAKEI YLTGTLGSGN AWGTGGSASV TFNSQTSLIL NQANIVSSQT 18 60 

DGIFSMLGQE GINKVFNQAG LAN I LGEVAM QSINKAGGLG NLIVNTLGSD SVIGGYLTPE 19 20 

QKNQTLSQLL GQNNFDNLMN DSGLNTAIKD LIRQKLGFWT GLVGGLAGLG GIDLQNPEKL 19 8 0 

IGSMSINDLL SKKGLFNQIT GFISANDIGQ VISVMLQDIV KPSDALKNDV AALGKQMIGE 2O40 

FLGQDTLNSL ESLLQNQQIK SVLDKVLAAK GLGSIYEQGL GDLIPNLGKK GIFAPYGLSQ 21_00 

VWQKGDF S FN AQGNVFVQNS TFSNANGGTL SFNAGNSLIF AGNNHIAFTN HSGTLNLLSN 23-60 

QVSNINVTML NASNGLKINA TNNNVSVSQG NLFINASCVQ QSDPTTASAT NPCTTAQNNA 22 2 0 

SSSNASNNAP IALNNNDESL WTANGFNFS GNIYANGWD FSKIKGSANV KNLYLYNNAQ 22 80 

FQANNLTISN QAVLEKNASF VTNNLNIQGA FNNNATQKIE VLQNLVIASN ASLSTGIYGL 23 4 0 

EVGGALNNLG AIHFNLENSQ TPVNPLIQVG GIINLNTTQT P FMNVS VANG GTYTLLKSSR 24 00 

YIDYNINPNS LQSYLKLYTL ININGNHIEE KNGVLTYLGQ RVLLQDKGLL LSVALPNSNN 24 60 

ASQNNILSLS VLHNQIKMSY GNKVMDFTPP TLQDYIVGIQ GQSALNQIEA VGGNNA I KWL 25 2 0 

STLMMETKEN PLFAPIYLEN HSLNEILGVT KDLQNTASLI SNPNFRNNAT SLLEMASYTQ 25 8 0 

QTSRLTKLSD FRAREGESNF SERLLELKNK RFSDPNPSEV FVKYSQLSKH PNNLWIQGVG 2640 

GASFISGGNG TLYGLNVGYD RLVKSVILGG YVAYGYSGFN GNIMHSLANN VDVGMYARAF 2700 
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LKRNEFTLSA NET YGGNAS H INSSNSLLSV 
LKPQVGLSYH FIGLSGMKGK MQNPAYQQFV 
VTARLGRDLL IKAKGDNWR FVGENTLLYR 
LKMGLQYQDL NITGNVGMRV AF 
5 <212> Type : PRT 

<211> Length : 2902 

SequenceNarae : SEQ ID 91 
SequenceDescription : 

10 Sequence 



<213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

MAFKKARLIS RFISKGSFKL NKISKKFFTL NQILKRFKPI* ICRHKKTKSIE KPFNKNKSFL * 60 

15 KASVLLIGAL GGLSHLRANE CRYWSWSSWS YQDNIESGPN SPTHNSYCLF SSAQGSGTYY 120 

LNTLTTYSAG GAS FTQKFNG GTLDIGGNIR FGGTGINGGD VGYITGTYNA QTMNFNSSHI 180 

TTGNS YADGG GTTLNFNATN NITINQASFD NSDAGTQKSY MNFKGSNIKI SGSSFTDDTN 240 

GGFNFSGNNN NSTISFNQTS FNQGTYNFSN SATLSFNNSN FNQGTYHFNS AQSTFENSNF 300 

NQGTYNFNDN TSFNNDTFNQ GTYNFNSSKV SFSGANTLNS SSPFASLKGS VSFNSGAIFN 3 60 

20 LNQTLNNNQT YDILTTNGAI QYGVYQSYLW DLINYKGDKA ISHVEVSNNT YDVTFDINGQ 420 

DETLQETFSN QSIITQFLGD DLQQQAQQTY QEDVANSQNA LNKVASDNTI ANNDTSYTQS 480 

SNPTILKDAQ GLENTNQQIQ QDEKALEKDL AQIKQLANST TGFNEQAFTQ AQKQEQQDEQ 540 

ALQNDENAFN TEQEGLEQAI ANAKHANPTP NPTPSPTPTP IKHTAPNTPP SQVPPTPPSQ 600 

NLPKTNVWNG VYWLQNKTYS NKGIYYIDPN LSGQSGQSGN TLSTYTANLL GRS FGVNANN 660 

25 GTLIIGNNTE SVNDNGLIWI GHGGFGYITG TFSAANIYLT NNFKTGEGVS NSDGGGANIT 720 

FKASDNITMD GLNYNNAETV TKMIQTGASQ HSYTTFDATN NISVTDSDFS DMTWGKFSFS 780 

AKNISFSNAS FSGFTNPGGS STISTNASNS LSFTDSRLNG GAIYNLQANS LIFNNTQAVF 840 

NVLYSRGTSN FNATTQLLGN TSFTLSSQSL LNFNGDTTLQ NNANITLGNK SQAAFKNSLT 900 

LDNNSNLSLD NQSVLNANGT SAFNNQASLN IYNGSQAAFS SLFFNGGTLS LNANSKLNAS 960 

30 SASFSNNTTI NLDDSVLNAN NTSSLMANIN FQGASQADFG GNTTIDTASF NFDSASSLNF 1020 

NNLTANGALN FNGYAPSLTK ALMNVSGQFV LGNNGDINLS DINIFDNITK SVTYNILNAQ 1080 

KGITGISGAN GYEKILFYGM KIQNATYSDN NNIQTWSFIN PLNSSQIIQE SIKNGDLTIE 1140 

VLNNPNSASN TIFNIAPELY NYQDSKQNPT GYSYDYSDNQ AGTYYLTSNI KGLFTPKGSQ 1200 

TPQTPGTYSP FNQPLNSLNI YNKGFS SENL KTLLGILSQN SATLKEMIES NQLDNITNIN 12 60 

35 EVLQLLDKIK ITQAQKQALL ETINHLTDNI NQTFNNGNLV IGATQDNVTN STSSIWFGGN 1320 

GYSSPCALDS ATCSSFRNTY LGQLLGSTSP YLGYINADFK AKSIYITGTI GSSNAFESGG 13 80 

SADVTFQSAN NLVLNKANIE AQATDNI FNL LGQEGIDKIF NQGNLANVLS QMAMEKIKQA 1440 

GGLGNFIENA LSPLSKELPA SLQDETLGQL IGQNNLDDLL NNSGVMNEIQ NIISQKLSIF 1500 

GNFVTPSIIE NYLAKQSLKS MLDDKGLLNF IGGYIDASEL SSILGVILKD ITNPPTSLQK 1560 

40 D I GWANDLL NEFLGQDWK KLESQGLVSN IINNVISQGG LSGVYNQGLG SVLPPSLQNA 1620 

LKENDLGTLL SPRGLHDFWQ KGYFNFLSNG YVFVNNSSFS NATGGSLNFV ANKSIIFNGD 1680 

NTIDFSKYQG ALIFASNGVS NINITTLNAT NGLSLNAGLN NVSVQKGEIC INLANCPTTK 1740 

NSSPANSSVT PTNESLSVHA NNFTFLGTII SNGAIDLSQV TNNSVIGTLN LNENATLQAN 1800 

NLTITNAFNN ASNSTANIDG NFTLNQQATL STNASGLNVM GNFNSYGDLV FNLSHSVSHA 18 60 

45 IINTQGTATI MANNNPLIQF NASSKEVGTY TLIDSAKAIY YGYNNQITGG SSLDNYLKLY 1920 

ALIDINGKHM VMTDNGLTYN GQAVSVKDGG LWGFKDSQN QYIYTSILYN KVKIAVSNDP 1980 

INNPQAPTLK QYIAQIQGVQ SVDSIDQAGG NQAINWLNKI FETKGSPLFA PYYLESHSTK 2040 

DLTTIAGDIA NTLEVIANPN FKNDATNILQ INTYTQQMSR LAKLSDTSTF ARSDFLERLE 2100 

ALKNKRFADA IPNAMDVILK YSQRNRVKNN VWATGVGGAS FISGGTGTLY GINVGYDRFI 2160 

50 KGVIVGGYAA YGYSGFHANI TQSGSSNVNV GVYSRAFIKR SELTMSLNET WGYNKTF INS 2220 

YDPLLSIIMQ SYRYDTWTTD AKINYGYDFM FKDKSVIFKP QVGLSYYYIG LSGLRGIMDD 2280 

PIYNQFRANA DPNKKSVLTI NFALESRHYF NKNSYYFVIA DVGRDLFINS MGDKMVRF I G 2340 

NNTLSYRDGG RYNTFASIIT GGEIRLFKTF YVNAGIGARF GLDYKDINIT GNIGMRYAF 2399 

55 <212> Type : PRT 

<211> Length : 2399 

SequenceNarae : SEQ ID 92 
SequenceDescription : 

60 Sequence 



<213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

MEIQQTHRKI NRPLVSLVLA GALISAIPQE SHAAFFTTVI IPAIVGGIAT GTAVGTVSGL 60 

65 LSWGLKQAEE ANKTPDKPDK VWRIQAGKGF NEFPNKEYDL YKSLLSSKID GGWDWGNAAR 12 0 

HYWVKGGQWN KLEVDMKDAV GTYKLSGLRN FTGGDLDVNM QKATLRLGQF NGNSFTSYKD 18 0 

SADRTTRVNF NAKNISIDNF VEINNRVGSG AGRKASSTVL TLQASEGITS SKNAEISLYD 240 



LNQRYNYNTW TTSVNGNYGY 
MHSNPSNESV LTLNMGLESR 
KGEIFNTFAS VI TGGEMHLW 



DFMFKQKSW 2760 
KYFGKNSYYF 2620 
RLMYVNAGVG 28 80 

2902 
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GATLNLASNS VKLNGNVWMG RLQYVGAYLA PSYSTINTSK VQGEVDFNHL TVGD QNAAQ A 3 00 

GIIASNKTHI GTLDL WQ SAG LNIIAPPEGG YKDKPNSTTS QSGTKNDKKE ISQNNNSNTE 360 

VINPPNNTQK TETEPTQVID GPFAGGKDTV VNIFHLNTKA DGTIKVGGFK AS L TTNAAHL 42 0 

MIGKGGVNLS NQASGRTLLV ENLTGNITVD GPLRVNNQVG GYALAGSSAN FEFKAGVDTK 480 

NGTATFNNDI SLGRFVNLKV DAHTANFKGI DTGNGGFNTL DFSGVTDKVN INKLITASTN 540 

VAVKNFNINE LIVKTNGISV GEYTHFSEDI GSQSRINTVR LETGTRSIFS GGVKFKSGEK 600 

LVINDFYYSP WNYFDARNVK NVEITRKFAS STPENPWGTS KLMFNNLTLG QNAVMDYSQF 660 

SNLTIQGDFI NNQGTINYLV RGGKVATLNV GNAAAMMFNN DIDSATGFYK PLIKiNSAQD 720 

L I KNTEHVLL KAKIIGYGNV STGTNGISNV NLEEQFKERL ALYNNNNRMD TCWRNTDDX 780 

KACGMAIGNQ SMVNNPDNYK YLIGKAWRNI GISKTANGSK ISVYYLGNST PTENGGNTTN 840 

LPTNTTNNAH SANYALVKNA P FAHS AT PNIi VAINQHDFGT IESVFELANR SKDIDTLYTH 90 0 

SGAQGRDLLQ TLLIDSHDAG YARQMIDNTS TGEITKQLNA ATDALNNVAS LEHKQSGLQT 960 

LSLSNAMILN SRLVNLSRKH TNHINSFAQR LQALKGQEFA SLESAAEVLY QFAPKYEKPT 102 0 

NVWANAI GGA SLNSGSNASL YGTSAGVPAF LNGNVEAIVG GFGSYGYSSF SNQANSLNSG 10 80 

ANNANFGVYS RFFANQHEFD FEAQGALGSD QSSLNFKSTL LQDLNQSYNY LiAYSATARAS 1140 

YGYDFAFFRN ALVLKPSVGV SYNHLGSTNF KSNSQSQVAL KNGASSQHLF NANANVEARY 1200 

YYGDTSYFYL HAGVLQEFAH FGSNDVASLN TFKINAARSP LSTYARAMMG GELQLAKEVF 1260 

LNLGWYLHN LISNASHFAS NLGMRYSF 12 88 
<212> Type : PRT 
<211> Length : 1288 

SequenceName : SEQ ID 93 

SequenceDescription : 

Sequence 



<213> OrganistnName : Helicobacter pylori J99 
<400> PreSequenceString : 

MKKHILSLTL GSLLVSTLSA EDDGFYTSVG YQIGEAAQMV TNTKGIQDLS DRYESLNNLL 60 

NRYSTLNTLI KL SAD P SAIN AVRENLGASA KNLIGDKANS PAYQAVLLAI NAAVGFWNW 120 

GYVTQCGGNA NGQKSISSKT I FNNEPGYRS TSITCSLNGH SPGYYGPMSI ENFKKLNEAY 180 

QILQTALKRG LPALKENNGK VNVTYTYTCS GDGNNNCSSQ VTGVNNQKDG TKTKIQTIDG 240 

KSVTTTISSK WDSRADGNT TGVSYTEITN KLEGVPDSAQ ALLAQAS TL I NTINNACPYF 3 00 

HASNSSEANA PKFSTTTGKI CGAFSEEISA IQKMITDAQE LVNQTSVINE HEQTTPVGNN 360 

NGKPFNPFTD ASFAQGMLAN ASAQAKMLNL AEQVGQAINP ERLSGTFQNF VKGFLATCNN 42 0 

PSTAGTGGTQ GSAPGTVTTQ T FAS GCAYVG QTITNLKNSI AHFGTQEQQI QQAENIADTL 480 

VNFKSRYSEL GNTYNSITTA LSNIPNAQSL QNAVSKKNNP YSPQGIDTNY YLNQNS YNQ I 540 

QTINQELGRN PFRKVGIVSS QTNNGAMNGI GIQVGYKQFF GQKRKWGARY YGFFDYNHAF 600 

IKSSFFNSAS DVWTYGFGAD ALYNFINDKA TNFLGKNNKL SVGLFGGIAL AGTSWLNSEY 660 

VNLATMNNVY NAKMNVANFQ FLFNMGVRMN LARPKKKDSD HAAQHGIELG LKIPTINTNY 720 

YSFMGAELKY RRLYSVYLNY VFAY 744 
<212> Type : PRT 
<211> Length : 744 

SequenceName : SEQ ID 94 

SequenceDescription : 

Sequence 



<213> OrganistnName : Helicobacter pylori J99 
<40 0> PreSequenceString : 

MIKKAKKFIP FFLIGSLLAE DNGWYMSVGY QIGGTQQFIN NKQLLENQNI INSITQSAIN 60 

IAGPTTGLIT LSSQTVIDAL GYGVSNTVGN QLEGISNILN QIGKRKDFYS SRQISSISQQ 12 0 

IIGLKGSSDP LKAHSSQITA KLLSNTQSAF DQGIALSSNI ISAVNSLNPS NNSQEVKAQL 180 

QNTAQSMAEL LQQIEHSITK TTSTTYAQSL LSNLTDAVNA SSNNTTYVSA LVNALNTLGV 24 0 

GVFPTTTSTH WLNPPGQW FYPTNSLLGS TSSNSNNQQQ YNNTLLMNTL QGELSTNNQN 3 00 

NPNGCANQIQ CLEQFIQNLT PLAATPTSTN QANQQVQAIA QKLQSVAINA LDNNAINNTT 360 

YNLNNLHNAL NFQAYQSTIE QYNNALKQIS WISFSEPKNL LKNTSNNYQI GTVTNDQGQN 42 0 

ISAYDCTSAT GSLSSDASSG ISCSATSSTN NTNSFDNSLV ATSKVQTING KEQIGVNSFN 480 

LVSQVWSVYN SLKTSEENLQ KNAKILCNNG SQSGTSPCNS SSGGLSISGN AQLQNILSPT 540 

NGTTTNTQAK SNASKLKAMV MVNNEEEAKT TNFNQSSGPT TQSSNSTVMG ALNTVLQNVS 600 

NFQQSIQSAF QNQENNIQAW ANALYNTSNP NGNQSQNLTT NNNQDLRIQL RANFYQLINT 660 

INQQVPTDMN ALINQSQQTQ QTSGSASTTN NACASGMGSS GNWCYQQWSD SKAYYSGLQS 720 

ALGYQTQATT QNGSSGGSNI TYNVQQITLT SGGLLNQIIT NLKSVNGGSN GGSSGNGTSQ 78 0 

INTAYQMLTD ASDGKLGTYN SSNSSNSSNS GNNNGYTPCN STNGSNGTSG SNCYEPNKQQ 840 

NAT TATTTTD SNLQKVYNDA QKIANIIASS GNNKGVENGL KQFFEALKSN SSSLSNLCGN 90 0 

GSSGSSSTCS GGLINLLGAI PTNGVSDTNN LINLLTEFIK TAGFIQNKDS NVSTSLTSAF 960 

QAITSAISQG FQALQNDISP NAILTLLQEI TSNTTTIQSF SQTLRQLLGD KTFFMVQQKL 1020 

IDAMINARNQ VQNAQNQANN YGSQPVLSQY AAAKS TQHGM SNGLGVGIGY KYFFGKARKL 10 8 0 



WO 2005/076010 



28/341 



PCT/IN2005/000037 



GLRHYFFFDY GFSEIGLANQ SVKANIFAYG VGTDFLWNLF RRTYNTKALN FGLFAGVQLG 1140 
GATWLSSLRQ QIIDNWGNAN DIHSTNFQVA LNFGVRTNFA EFKRFAKKFH NQGVISQKSV 1200 
EFGIKVPLIN QAYLNSAGAD VSYRRLYTFY INYIMGF 1237 
<212> Type : PRT 
5 <211> Leagth : 1237 

SequenceName : SEQ ID 95 

SequenceDescription : 

Sequence 
10 

<213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

MKQNLKPFKM IKENLMTQSQ KVRFIiAPLSL ALSLSFNPVG AEEDGGFMTF GYELGQWQQ 60 
VKNPGKIKAE ELAGLLNSTT TNNTNINIAG TGGNVAGTLG IJIiEMNQIiGNL IDLYPTLKTN -.120 

15 NLHQCGSTNS GNGATAAAAT NNSPCFQGNL ALYNEMVDSI KTLSQNISKN IFQGDNNTTS 18 0 

ANLSNQLSEL NTASVYLTYM NSFLNANNQA GGIFQNNTNQ AYENGVTAQQ IAYVLKQASI 240 
TMGPSGDSGA AGAFLDAALA QHVFNSANAG NDLSAKEFTS LVQNIVNNSQ NALTLANNAN 3 00 

ISNSTGYQVS YGGNIDQARS TQLLNNTTNT LAKVTALNNE LKANPWLGNF AAGNSSQVNA 3 60 

FNGFITKIGY KQFFGENKNV GLRYYGFFSY NGAGVGNGPT YNQVNLLTYG VGTDVLYKVF 420 

20 SRSFGSRSLN AGFFGGIQLA GDTYISTLRN SPQLASRPTA TKFQFL FDVG LRMNFGILKK 48 0 

DLKSHNQHSI EIGVQIPTIY NTYYKAGGAE VKYFRPYSVY WVYGYAF 527 
<212> Type : PRT 
<211> Length : 527 

SequenceName : SEQ ID 96 

25 SequenceDescription : 

Sequence 



<213> OrganismNarae : Helicobacter pylori J99 

30 <400> PreSequenceString : 

MKKTLLLSLS LSFGLHAEDD GFYASAGIRI GEAAQMVKNT KGIQQLSENY EKLNNLLNNY 60 

NTLNTLVKLS SDPSAVNDAR DNLGS STRNL LDVKANSPAY QAVLLALNAA VGLWQVTSYA 12 0 

FTACGPGSNE NANGGIQTFN NVPGQNTTTI TCNSYYEPGH GGPISTKNYA IINKAYQIIQ 180 

KALTANGEGI PVLSNTTTKL DFTINGDKRT GGEPNKKLVY PWSHGKAIST SWNATITAPT 240 

35 TENINTTNSA QELLKQASII ITTLNSACPN FQNGGSGYWA GISGNGTMCG MFKNEISAIQ 3 00 

GM I ANAQEAV AQAKIVSENT QNQNS LDAGK PFNPYTDASF AESMLKNAQA QAEILNQAEQ 3 60 

WKNFEKIPT AFVNDSLGVC YEVQGGERRG TNP GQTTSNT WGAGCAYVGQ TITNLKNSIA 420 

HFGTQEQQIQ QAENIADTLV NFKSRYSELG NTYNSITTAL SNIPNAQSLQ NAVS KKNNP Y 480 

SPQGIDTNYY LNQNSYNQIQ TINQELGRNP FRKVGIVSSQ TNNGAMNGIG IQVGYKQFFG 540 

40 QKRKWGARYY GFFD YNHAF I KSSFFNSASD VWTYGFGADA LYNFINDKAT NFLGKNNKLS 600 

VGLFGGIALA GTSWLNSEYV NLATMNNVYN AKMNVANFQF LFNMGVRMNL ARPKKKDSDH 660 

AAQHGIELGL KIPTINTNYY SFMGAELKYR RLYSVYLNYV FAY 703 
<212> Type : PRT 
<211> Length : 703 

45 SequenceName : SEQ ID 97 

SequenceDescription : 



Sequence 



50 <213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

MIKKNRTLFL SLALCASISY AEDDGGFFTV GYQLGQVMQD VQNPGGAKSD ELARELNADV 60 

TNNILNNNTG GNVAGALSNA FSQYLYSLLG AYPTKLNGND VSANALLSGA VGSGTCAAAG 120 

TAGGTTLNTQ SACTAAGYYW LPSLTDRILS TIGSQTNYGT NTNFPNMQQQ LTYLNAGNVF 180 

55 FNAMNKALEK NGTATANSTS STSGATGSDG QTYSQQAIQY LQGQQNILNN AANLLKQDEL 240 

LLEAFNSAVA ANIGNKEFNS AAFTGLVQGI IDQSQLVYNE LTKNTISGSA VNNAGINSNQ 3 00 

ANAVQGRASQ LPNALYNVQV TLDKINALNN QVRSMPYLPQ FRAGNS RATN ILNGFYTKVG 3 60 

YKQFFGKKRN IGLRYYGFFS YNGASVGFRS TQNNVGLYTY GVGTDVLYNI FSRSYQNRSV 42 0 

DMGFFSGIQL AGETFQSTLR DDPNVKLHGK INNTHFQFLF DFGMRMNFGK LDGKSNRHNQ 48 0 

60 HTVEFGWVP TIYNTYYKSA GTTVKYFRPY SVYWSYGYSF 520 



<212> Type : PRT 

<211> Length : 520 

SequenceName : SEQ ID 98 
SequenceDescription : 

65 

Sequence 
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<213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 
MKKKFLSLTL GSLLVSALSA EDNGFFVSAG 
TNYSVLNALI RQSADPNAIN NARGNLNASA 
5 SYAISPCGPG KDTSKNGGVQ TFHNTPSNQW 
AYQIIQKAFG SSGKDIPALS DTNTELKFTI 
TIITTLNSAC PWINNGGAGG ASSGSLWEGI 
EQSKIVAANA QNQRNLDTGK TFNPYKDANF 
EFVKDSLGVC HEVQNGHLRG TPSGTVTDNT 

10 NARNLAYTLA NFSSQYQKLG EHYDS I TAAI 
IDSNIHSQVQ SRSQELGSNP FRRAGLIAAS 
YGFVDYNHTY NKSQFFNASS DVWTYGVGSD 
WLNSQYVNLA NVNNYYKAKI NTANFQFLFN 
TINTNYYSLL-GTTLQYRRLY SVYLNYVFAY 

15 <212> Type : PRT 

<211> Length : 690 

SequenceName : SEQ ID 99 
SequenceDescription : 

20 Sequence 



<213> OrganismName : Helicobacter pylori J99 
<4 00> PreSequenceString : 

MKIKKSLFAL SFSLMASLSR AEDDGFYMSV GYQIGEAVQK VKNTGALQNL ADRYDNLSNL 60 

25 LNQYNYLNSL VNLASTPSAI TGAIDNLSSS AINLTSATTT SPAYQAVALA LNAAVGMWQV 12 0 

IAFGISCGPG PNLGP EHLEN GGVRS FDNTP NYSYNTGSGT TTTTCNGASM VGPNGILSSS 18 0 

EYQVLNTAYQ TIQTALNQNQ GGGMPALNSS KNMWNINQT FTKNPTTEYT YPDGNGNYYS 240 

GGSSIPIQLK ISSVNDAENL LQQAATIINV LTTQNPHVNG GGGAWGFGGK TGNVMDIFGD 3 00 

SFNAINEMIK NAQAVLE KTQ QLNANENTQI TQPDNFNPYT SKDTQFAQEM LNRANAQAEI 3 60 

30 LSLAQQVADN FHSIQGPIQQ DLEE CTAGS A GVINDNTYGS GCAFVKETLN SLEQHTAYYG 420 

NQVNQDRALS QTILNFKEAL STLGNDSKAI NSGISNLPNA KSLQNMTHAT QNPNSPEGLL 480 

TYSLDTSKYN QLQTVAQELG KNPFRRIGVI NYQNNNGAMN GIGVQAGYKQ FFGKKRNWGL 540 

RYYGFFDYNH AYIKSNFFNS ASDWTYGVG MDALYNFIND KNTNFLGKNN KLSVGLFGGF 600 

ALAGTSWLNS QQVNLTMMNG IYNANVSASN FQFLFDLGLR MNLARPKKKD SDHAAQHGME 660 

35 LGVKI PTINT DYYSFMGAEL KYRRLYSVYL NYVFAY 696 



<212> Type PRT 

<211> Length : 696 

SequenceName : SEQ ID 100 
SequenceDescription : 

40 

Sequence 



<213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

45 MKIKKSLFAL SFSLMASLSR AEDDGFYMSV GYQIGEAVQK VKNTGALQNL ADRYDNLSNL 60 

LNQYNYLNSL VNLASTPSAI TGAIDNLSSS AINLTSATTT SPAYQAVALA LNAAVGMWQV 12 0 

IAFGISCGPG PNLGPEHLEN GGVRSFDNTP NYSYNTGSGT TTTTCNGASN VGPNGILSSS 18 0 

EYQVLNTAYQ TIQTALNQNQ GGGMPALNSS KNMWNINQT FTKNPTTEYT YPDGNGNYYS 240 

GGSSIPIQLK ISSVNDAENL LQQAATIINV LTTQNPHVNG GGGAWGFGGK TGNVMDIFGD 30 0 

50 SFNAINEMIK NAQAVLEKTQ QLNANENTQI TQPDNFNPYT SKDTQFAQEM LNRANAQAE I 360 

LSLAQQVADN FHSIQGPIQQ DLEE CTAGS A GVINDNTYGS GCAFVKETLN SLEQHTAYYG 42 0 

NQVNQDRALS QTILNFKEAL STLGNDSKAI NSGISNLPNA KSLQNMTHAT QNPNSPEGLL 480 

TYSLDTSKYN QLQTVAQELG KNPFRRIGVI NYQNNNGAMN GIGVQAGYKQ FFGKKRNWGL 54 0 

RYYGFFDYNH AYIKSNFFNS ASDWTYGVG MDALYNFIND KNTNFLGKNN KLSVGLFGGF 600 

55 ALAGTSWLNS QQVNLTMMNG I YNANVSASN FQFLFDLGLR MNLARPKKKD SDHAAQHGME 660 

LGVKI PTINT DYYSFMGAEL KYRRLYSVYL NYVFAY 696 
<212> Type : PRT 
<211> Length : 696 

SequenceName : SEQ ID 101 



60 SequenceDescription : 

Sequence 



<213> OrganismName : Helicobacter pylori J99 
65 <400> PreSequenceString : 

MHKKVLLALT ASLICQESLF AKDKDYTLGK VSTAGKKDRS DYSGQVNLGY SGITAPKSWQ 60 
DEEVKKYTGS RTVISNKALT QQANQSIEEA LQNVPGLQIR NATGVGAMPT IQIRGFGAGG 120 



YQIGESAQMV KNTKGIQDLS 
KNLINDKKNS PAYQAVLLAL 
GGTTITCGTT GYEPGPYSIL 
NKNNGNTNTN NNGEEIVTKN 
YLKGDGSACG IFKNEISAIQ 
AQSMFANAKA QAE ILNRAQA 
WGAGCAYVGE TVTNLKDS I A 
SSLPDAQSLQ NWS KKTNPN 
TTNNGAMNGI GFQVGYKQFF 
LLVNF INDKA TKHNKISFGA 
LGLRMNLARK KHRATDNAAQ 



DSYERLNNLL 60 

NAAAGLWQVM 120 

STENYAKINK 18 0 

NAQVLL EQAS 240 

DM I KNAAI AV 3 00 

WKDFERIPA 3 60 

HFGDQAERIH 42 0 

SPQGIQDNYY 480 

GKNKRWGARY 540 

FGGIALAGTS 600 

HGIELGTKIP 660 

690 
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S GHS DATLML VNGIPVYMAP YAHIELDIFP VTFQAIDRID VIKGGGSVQY GPNTYGGIVN 
IITKPIPNQW ENQAAERITY WAKARNAGFA APPDKTGDPS FIKSLGNNLL YNTYVRSGGM 
TNKHVGIQAQ ANWVRGQGFR DMSPSSISNY WLDGVYDINE SNGIKAYYQY YDFAXAQPGS 
LSEQDYKINR FANLRPLNQK GGRS QRFGAV YENRFGDLDR VGGTFSFTYY GQLMTRDFQV 
5 SSSYNSANMV TCFSEAACRA AGLPAGYNLA VPYYATNYNG WAEVENPVRS INNAFEPKVN 
LIVNTGKVRQ TFIMGLRFMT TTFLQRQYLN TNECATKTSG EGAGFLCEGP NVMSGWKPHI 
KHGVYRNWNN WRNNYTAVYL SDRIEAWDGR FFIVPGLRYA FVQYNNENAS NWMQI PEKDL 
RKIKHMNNWM PSTNIGFIPV QGDHNVLTYF NYQRSFVPPQ LDVLSYGGAE YFTQHFDTVE 
AGARYTYKDK FSFNADYFRI WARD FATGQ Y SVYTSGPMKG NVRPINGYSQ GVELELYYRP 
10 IRGLQFHAAF NYIDTRVTSH GPLTDLNGDV LKGTS YNKHF PFVSPFQFIF DARYNWRKTT 
IGISSYFYSR AYSGI SNSAA GGYYGMQYYS GGNNYESVLN SGYQCEAWCM TQHEGLLPWY 
WVWNIQVSQI FWENGRHRVT GSLQINNIFN MKYYFTGIGS SPAGLQPAPG RSVTAYLNYT 
F 

<212> Type : PRT 
15 <211> Length : 841 

SequenceName : SEQ ID 102 
SequenceDescription : 

Sequence 
20 

<213 > Organi smName : Helicobacter pylori J99 

<400> PreSequenceString : 

MKKTLLLSLS ASSLLNAEDN GFFISAGYQI 

NNLNQAVTNA SSPSEINAAI DNLKANTQGL 
25 VQCGPGNSGQ QSVTFEGQPG HNSSSINCNL 

KQDSGFPVLD SAGKQVTITI TTQTNGANKS 

PWVNHNQGQN GGAPWGLDTA GNVCQVFATE 

QDFNPYTSAD RAFAQNMLNH AQAQAKILEL 

PDAGVTNNTW GAGCAYVEET ITALNNSLAH 
30 TYNSITTTAS NTPNSPFLKN LISQSTNPNN 

RRVGLISSQT NNGAMNGIGV QVGYKQFFGE 

FTYGVGTDVL YNFINDKTTK NSKISFGVFG 

ANFQFLFNLG LRMNIAKNKK KASDHAAQHG 

YLNYVFAY 
35 <212> Type : PRT 

<211> Length : 668 

SequenceName : SEQ ID 103 
SequenceDescription : 

40 Sequence 



<213> OrganismName : Helicobacter pylori J99 
<40 0> PreSequenceString : 
MRKLFIPLLL FSALEANEKN GFFIEAGFET 

45 ILKRAANLFT NAEAISKLKF SSLSPVRVLY 
IDLGVIETIP KHSKIVLPGE AFDS LKEAFD 
LNNIKTNLIM KYSNENPNNF NT CP YNNNGN 
QSWGDAILNA PFEFTNSSTD CDSDPSKCVN 
EIDAWLKNS GWGLANGYG NDGEYGTLGV 

50 SHTKGYGHNG NMTYQRVPVT KDGQVEKDSN 
NSIYHNCADV PAGFLGVTAA WQQLINQNA 
STIQKTFVTS SVTNHHFSNA SQSFRSPILG 
VNQKVQQLSY GGGIDLLLDF ITTYSNKNSP 
KVKGSGNLDV ATGLNYRYKH SKYSVGISIP 

55 FFNYGWVF 

<212> Type : PRT 
<211> Length : 668 

SequenceName : SEQ ID 104 
SequenceDescription : 

60 

Sequence 



<213> Organi sinName : Helicobacter pylori J99 
<400> PreSequenceString : 
65 MNKTTIKILM GMALLSSLQA AEAELDEKS K KPKFADRNTF YLGVGYQLSA INTSFSTSSI 60 
DKSYFMTGNG FGWLGGKFV AKTQAVEHVG FRYGLFYDQT FSSHKSYIST YGLEFSGLWD 120 
AFMSPKMFLG LEFGLGIAGA TYMPGGAMHG IIAQYLGKEN SLFQLLVKVG FRFGFFHNEI 180 



180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
841 



GEAAQMVKNT GELKKLSDTY 
I GE KTNS PAY QAVYLALNAA 
TGYNNGVSGP LSIENFKKLN 
ETTTTTTTTN DAQTLLQEAS 
FSAVTSMIKN AQEIVTQAQS 
ADQMKKDLNT IPSQFITNYL 
FGTQAEQIKQ SELLARTILD 
PGGLQAVYQV NQSAYSQLLS 
KRRWGLRYYG FFDYNHAYIK 
GIALAGTSWL NSQYVNLATF 
VELGVKIPTI NTNYYSLLGT 



ENLSNLLTNF 60 

VGLWNVIAYN 120 

QAYQTIQQAL 180 

KM I SVLTTNC 240 

LNQQNNQNAP 3 00 

AACHNGGGTL 3 60 

FRGSLSNLNN 42 0 

ATQELGHNPF 480 

SSFFNSASDV 540 

NNFYSAKMNV 600 

QLQYRRLYSV 660 

668 



GLLEGTQTQE KRHTTTKNTY 
MYNGQLTIEN FLPYNLNNVK 
KIDPYTLFLP KFEATSTSIS 
TKNDCWQNFT PQTAEEFTNL 
PGVNGRVDTK VDQQYILNKQ 
EAYALD PKKL FGNDLKTINL 
GKPKDSDGLP YNVCSLYGGS 
LPINYANLGS QTNYNLNASL 
VNAKIGYQNY FNDFIGLAYY 
TGIQTKRNFS SSFGIFGGLR 
LIQRKASWS SGGDYTNSFV 



ATYNYLPTDT 60 

LSFTDAQGNT 120 

DTNTQRVFET 18 0 

MLNMIAVLDS 24 0 

GIINNFRKKI 3 00 

EDLRTILHEF 360 

NQPAFPSNYP 420 

NTQDLANSML 480 

GI I KYNYAKA 540 

GLYNSYYVLN 600 

FNEGASHFKV 660 

668 
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TFGLKFPVIP NKKTEIVDGL SATTLWQRLP VAYFNYIYNF 220 
<212> Type : PRT 
<211> Length : 220 

SequenceName : SEQ ID 105 
5 , SequenceDe script ion : 

Sequence 



<213> OrganismName : Helicobacter pylori J99 

10 <400> PreSequenceString : 

MKKTKKTILL SLTLAASLLH AEDNGVFLSV GYQIGEAVQK VKNADKVQKL SDVYEQLSKL 60 

LANDNGTSSK TSAQAINQAV NNLNESAKTL AGGTTNS PAY QATLLALRSA LGLWNSMGYA 12 0 

WCGGYIKKP GENNQKNFHY TDENGNGTTI NCGGS TNSNG THSPNGTNTL KADKMVSLSI 18 0 
* EQYEKIKEAY QILSKALKQA GLAPLNS KGE KLEAHVTTS K DQQGTSSDQT TTTTSVIDTT . - .24 0 

15 NDAQNLLTQA QTIVNTLKDY CPMLIAKSSS NGGTNGANTP SWQTAGGGKN SCATFGAEFS 3 00 

AISDMISNAQ KIVQETQQLN ANQPKNITQP NNFNLNSPGS LTALAQSMLK NAQSQTEILK 3 60 

LANQVASDFD KLSSGYLKDY IGKCDVSGVS SSNMTPQNMN TTWGKGCAGV EETLTSLKAS 42 0 

TTDFNNQTTP QLDQAQTLAN TLTQELGNNP FKRVGIIGSQ TNNGAMNGLG VQAGYKQFFG 480 

QKRRWGLRYY GFFDYNHTYI KSSFFNSSSD VLTYGVGSDL LFNFINDKNT NFLGKNNKIS 540 

20 VGLFGGIALA GTSWLNSQFV NLKTISNVYS AKVNTANFQF LFNLGLRTNL ARPKKKDSDH 600 

SAQHGMELGV KIPTINTNYY SYLGTKLEYR RLYSVYLNYV FAY 643 
<212> Type : PRT 
<211> Length : 643 

SequenceName : SEQ ID 106 



25 SequenceDescription : 

Sequence 



<213> OrganismName : Helicobacter pylori J99 

30 <400> PreSequenceString : 

MKKTILLSLM VSSLFAENDG VYMSVGYQIG EAAQMVKNTG EIQKVSNAYE NLNNLLTRYN 60 

ELKQTASNTD SSTAQAIDNL EKSASRLKTT PNTANQAVSS ALSSAVGMWQ VIASNLANNS 12 0 

LS S S EYjBKLK ATSQLLQNTL ENKNItfNLKIE NDYDQLLTQA STIINTLQSQ CPGVDGGNGK 180 

PWGINTSGNA CAIFGSTFNA INSMIDSAKK AAADARRTAP ESPNQQNAFT NADFNKNLNQ 24 0 

35 VSSVINDTIS YL KGDNLET I YNTIQKTPNS KGFQSLVSRS SYSYSLNETQ YSQFQTTTKE 3 00 

FGHNPFRSVG LINSQSNNGA MNGVGVQLGY KQFFGKNKFF GIRYYGFFDY NYAYI KSNFF 3 60 

NSASNVFTYG AGSDLLLNFI NGGSDRNRKV SFGIFGGIAL AGTTWLNNQS ANLKITNSAY 420 

SAKINNTNFQ FLFNTGLRLQ GIHHGIELGV KIPTINTNYY SFMGAKLAYR RLYSLYLNYV 48 0 

LAY 483 

40 <212> Type : PRT 

<211> Length : 483 

SequenceName : SEQ ID 107 
SequenceDescription : 

45 Sequence 



<213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

MPKASQVLFF GAFLSTSLQG FEAKLNGFVD QSSTIGFNQH KINKERGIYP MQQFATIAGY 60 

50 LGLGFSLLPK KVSDHVLKGK IGGMVGSIFY DGTKKFEDGS VAYNLFGYYD GFMGVYTNIL 120 

QTDSLETQNM KHNKNVRNYV FSDAYLEYAY KNYFE I KAGR YLSTMPYKSG QTQGFQVSGQ 180 

YKHARLTWFS SWGRAFAYGS FLMDWFAART TYSGGFTKNN NGG YDS HGRK VLYGTHAVQL 24 0 

TYKPHRFLIE GFYYLSPQIF NAPGVKI GWD SNPNFSGTGF RSDTAIIGFF PIYYPWMIVK 300 

SNGS PVYRYD TPATQNGQNL IIRQRFDINN YNVSIAFYKV FQNANGWI GN MGNPSGVIMG 3 60 

55 SNSVYAGFTG TALKRDAATI FLS CGGTHFA KKFTWKFATQ YSNSWSWEA RAMISLGYKF 420 

TEYLSGSVDL AYYGVHTNKG FKPGENGPVP KNFPALYSDR SALYTALVAS F 471 



<212> Type : PRT 
<211> Length : 471 
60 SequenceName : SEQ ID 108 

SequenceDescription : 

Sequence 



65 <213> OrganismName : Helicobacter pylori J99 
<4 00> PreSequenceString : 

MLRLVSKTIC LSLISLFNPL EAFQKHQKDV FFVEAGFETG LLEGAQTKEQ AIAQNTQNTQ 60 
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KIYENPLTHP QTKEQPKEQN KSDTATPQSV YGRYYILQNT ILEKATELFT AANINGNGLT 12 0 

FYSQNPVYVM AYNKDNAEFE GYGNNSVWI QNFLPYNLNN IELSYTDAQG KAVNLGVIET 18 0 

IPKDSQIILP ASLFNNFSND SPFNSDGLQQ LQTTTTPFSD ANTQSLFEKL SQITTNLQMT 24 0 

YENTDPFSSG NNDPNGPLAS PKPHYECPGY KKSCQVASVS FTPQTAEELT NLMLDMIAVF 3 00 

5 DSKSWEEAVL NAPFQFSNSP SECGIDYPKC VNP FNNGL VD PKDEKYALTP EEVINSYRVA 3 60 

NELTVNLLNA AKGFLGLGSQ LGSANAPDDD GFNQGVLGIA PFALDPEKLF GKNLNKVAI L 420 

ALRDIIHEYG HTLGYTHNGN MTYQRVRLCQ EGNGPEARCE GGHEVEKNGK EELEFSNGHE 48 0 

VRDHDGYTYD VCSRFGGKNQ PAFPSNYPNS IYTNCAQVPA GLIGVTTAVW QQLINQNALP 540 

INFANLNSQT SHLNAGLNAQ NFATSMVSAI AQNFSTTSTT TYRSSSKNFR SPILGVNVKI 60 0 

10 GYQHYFNDYI GLAYYGIIQY NYAQANDEKI QQLSYGGGMD VLFDFITTYT NKKQDHPTKK 660 

VFASSFGVFG GLRGLYNSYY VFNQVKGSGN LDIVTGFNYR YKHSKYSIGV SVPLIQSGIK 720 

IASNNGIYAD SWLNEGGS H FKVFFNYGWV F 751 
<212> Type : PRT 

<211> Length : 751 — * • « - 

15 SequenceNarae : SEQ ID 109 

SequenceDescription : 



Sequence 



20 <213> OrganistnName : Helicobacter pylori J99 
<400> PreSequenceString : 

MQNFVFNKKW LIYSSLLPLF FLNPLMAEDD GFFMGVSYQT SLAVQRVDNS GLNASQDAST 60 

YIRQNAIALE SAAVPLAYYL EAMGQQTRVL MQMLCPDPSK RCLLYAGGYQ NGQNNNGDTG 120 

NNPPRGNVNA TFDMQSLVNN LNKLTQLIGE TLIRNPENLP NSKVFNVKFG NQSTVIALPE 180 

25 GLANTMDALN NDITNALTTL WYNQTLTNKS FSTPSNTSVN FSPQVLQHLL QDGLATANNN 24 0 

QTICSTQNQC TATNEAKS I A QNAQNIFQAL MQAGILGGLA NEKQFGFTYN KAPNGSDSQQ 3 00 

GYQSFSGPGY YTKNDNTTQA PLKALPAGAT IGSGNGQYTY HPS S AVYYLA DSIIANGITA 3 60 

SMIFSGMQNF ANKAAKL I GT SSYNQMQDAI NYGE S LIi SNT VAYGDFITNW VAPYLDLNNK 42 0 

GLNFLPNYGG QLMGANNQTP QLTPQQAQQE QKVIMNQLEQ ATNAPTPAQI NRILANPYSP 4 80 

30 TAKTLMAYGL YRSKAVIGGV IDEMQTKVNQ VYQMGFARNF LEHNSNSNNM NGFGVKMGYK 540 

QFFGKKRMFG LRYYGFYDFG YAQFGTESSL VKATLSSYGA GTDFLYNVFT RKRGTEAIDI 600 

GFFAGIQLAG QTWKTNFLDQ VDGNHLKPKD TSFQFLFDLG IRTNFSKIAH QKRSRFSQGI 660 

EFGLKIPVLY HTYYQSEGVT AKYRRDFSFY VGYNIGF 697 
<212> Type : PRT 

35 <211> Length : 697 

SequenceName : SEQ ID 110 
SequenceDescription : 



Sequence 
40 — - 

<213> OrganistnName : Helicobacter pylori J99 
<400> PreSequenceString : 

MKKTILLSLS LSLASSLLHA EDNGFFVSAG YQIGEAVQMV KNTGELKNLN EKYEQLSQYL 60 

NQVASLKQSI QNANNIELVN SSLNYLKSFT NNNYNSTTQS PIFNAVQAVI TSVLGFWSLY 120 

45 AGNYLTFFW NKDTQKPASV QGNPPFSTIV QNCSGI ENCA MNQTTYDKMK KLAEDLQAAQ 18 0 

QNATTKANNL CALSGCATTQ GQNPSSTVSN ALNLAQQLMD LIANTKTAMM WKNIVIAGVS 240 

NVSGAIDSTG YPTQYAVFNN IKAMIPILQQ AVTLSQSNHT LSASLQAQAT GSQTNPKFAK 3 00 

D I YAFAQNQK QVISYAQDIF NLFSSIPKDQ YRYLEKAYLK IPNAGKTPTN PYRQEVNLNQ 360 

EIQTIQNNVS YYGNRVDAAL SVAKDVYNLK SNQTEIVTTY NNAKNLSQE I SKLPYNQVNT 42 0 

50 KDIITLPYDQ NAPAAGQYNY QINPEQQSNL SQALAAMSNN PFKKVGMISS QNNNGALNGL 480 

GVQVGYKQFF GESKRWGLRY YGFFDYNHGY IKSSFFNSSS DIWTYGGGSD LLVNFINDSI 54 0 

TRKNNKLSVG LFGGIQLAGT TWLNSQYMNL TAFNNPYSAK VNASNFQFLF NLGLRTNLAT 600 

AKKKDSERSA QHGVELGIKI PTINTNYYSF LGTKLEYRRL YSVYLNYVFA Y 651 

55 <212> Type : PRT 

<211> Length : 651 

SequenceName : SEQ ID 111 
SequenceDescription : 

60 Sequence 



<213> OrganistnName : Helicobacter pylori J99 
<400> PreSequenceString : 

MLKLASKTIC LSLISSFTAV EAFQKHQKDG FFIEAGFETG LLQGTQTQEQ TIATTQEKPK 60 

65 PKPKPKPITP QSTYGKYYIS QSTILKNATE LFAEDNITNL TFYSQNPVYV TAYNQESAEE 12 0 

AGYGNNSLIM IQNFLPYNLN NIELSYTDDQ GNWSLGVIE TIPKQSQIIL PASLFNDPQL 180 

NADGFQQLQT NTTRFSDAST QNLFNKLSKV TTNLQMTYIN YNQFSSGNGS GSKPPCPPYE 240 
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NQANCVAKVP PFTSQDAKNL TNLMLNMMAV 
LTCVNPYNDG LVDPKLIAKN KGDEYNIENG 
GGDLGLGSQY GGPNGPGDDG TNFGALGILS 
TLGYTHNGNM TYQRVRMCEE NNGPEERCQG 
5 CSRFKDKPYT AGSYPNSIYT DCSQVPAGLI 
HAS LNTQDFA TTMIiSAISQS LSSSKSSATT 
LGLSSYGIIK YNYAQANNEK IQQLSYGVGM 
VFGGLRGLYN SYYLLNQYKG SGNLNVTGGL 
YTNSITLNEG GSHFKVFFNY GWIF 
10 <212> Type : PRT 

<211> Length : 744 

SequenceName : SEQ ID 112 
SequenceDescription : 

15 Sequence 



<213> OrganismName : Helicobacter pylori J99 
<40 0> PreSequenceString : 
MRKLFIPLLL FSALEANEKN GFFIEAGFET 

20 ILKRAANLFT NAEAISKLKF SSLSPVRVLY 
IDLGVIETIP RHSKIVLPGE AFDSLKIDPY 
TNLWNYRNE NKFKDHENHW EAFTPQTAEE 
SPTDCDMDPS KCVNPGTNGL VNSKVDQKYV 
SDITPSNNDD GKHYGQLGW ASALDPKKLF 

25 MTYQRVPVTK DGQVEKDSNG KPKDSDGLPY 
AGFLGVTAAV WQQLINQNAL PINYANLGSQ 
VTNHHFSNAS QSFRSPILGV NAKI GYQNYF 
GGIDLLLDFI TTYSNKNSPT GIQTKRNFSS 
TGLNYRYKHS KYSVGISIPL IQRKASWSS 

30 

<212> Type : PRT 
<211> Length : 657 

SeguenceWame : SEQ ID 113 

SequenceDescription : 

35 

Sequence 



<213> Organ ismKTame r Helicobacter pylori J99 
<400> PreSequenceString : 

40 MSLATSYNVS NNFSKFNIKR VRGYLICLVC MTPKMIQRGL NGVSFYGCSD YVNKGDCKGV 60 
LREINGSMKM VCLHCENTPI MEKVESGRGG AYACKNCNRK FYFIDLAKQN ERKKDLEKEK 12 0 

KELLNKI EKQ KIKHLERFIL AGVKANIKEN SFFLGCKNYP KCEWTASMDS QDLKCPKCNR 180 
LMKRKKNFKN NEFFTATSLT LNAIEFCLYI NLKKKETNV 219 
<212> Type : PRT 

45 <211> Length : 219 

SequenceName : SEQ ID 114 
SequenceDescription : 

Sequence 
50 

<213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

ME I KKYFL YA LFFLLFSGLF LSKLQAYKFN MSIVGKVSSY TKFGFNNQRY QPSKDIYPTG 60 
SYTSLLGELN LSMGLYKGLR AEVGAMMAAL PYDSTAYQGN NIPNGQPGSR TDPFGAGIFW 12 0 

55 QYIGWYAGHS GLNVQKPRLA MVHNAFLSYN YKKDKFSFGV KGGRYDAEEY DWFTSYTQGV 180 
EGFVKYKDTR LRVMYSDARA SASSDWFWYF GRYYTSGKAL MIADLKYEKD NLKINPYFYA 240 
IFQRMYAPGI NITYDTNPNF NNKGFRFVGT FVGFFPIFAT PANQNDIILF QQVPLGKSGQ 3 00 

TYFFRTRFYY NKWQFGGSVY KNIGNANGDI GIYGDPLGYN IWTNSIYDAE INNIVGADVI 3 60 

NGFLYVGSQY RGFSWKILGR WTDSPRADER SLALFLSYFS NKYNI RMDLK LEYYGNITKK 420 

60 GYCIGYCGMY VPVDPNGPGT QPLTHNVYSD RSHIMFNITY GFRIY 465 
<212> Type : PRT 
<211> Length : 465 

SequenceName : SEQ ID 115 
SequenceDescription : 

65 

Sequence 



FDSKSWEDAV LNAPFQFSDN 
QTGSVILTPQ DVIYSYRVAN 
PFLDPEILFG KELNKVAIMQ 
GRIEQVDGKE VQVFDNGHEV 
GVT'SAVWQQL IDQNALPVDF 
YRTSKTSRPF GAPLLGVNLK 
DVLFDFITNY TNEKNPKSNL 
NYRYKHSKYS IGISVPLVQL 



NLSAPCYSDY 3 00 

NIYVNLLPTR 3 60 

LRDI IHEYGH 42 0 

RDTDGSTYDV 4 8 0 

TNLSSQTNYL 540 

MGYQKYFNDY 60 0 

TKKVFTSSLG 660 

KSRIVSSDGA 72 0 

744 



GLLEGTQTQE KRHTTTKNTY ATYNYLPTDT 60 

MYNGQLTIEN FLPYNLNNVK LSFTDAQGNV 12 0 

TLFLPKIEAT STSISDANTQ RVFETLNKIK 18 0 

FTNLMLNMIA VLDSQSWGDA ILNAPFEFTN 240 

LNKQDIVNKF KNKADLDVIV LKDSGWGLG 3 00 

GDNLKTINLE DLRTILHEFS HTKGYGHNGN 3 60 

NVCSLYGGSN QPAFPSNYPN SIYHNCADVP 42 0 

TNYNLNASLN TQDLANSMLS TIQKTFVTSS 480 

NDFIGLAYYG 1 1 KYNYAKAV NQKVQQ^SYG 540 

SFGIFGGLRG LYNSYYVLNK VKGSGNLDVA 600 

GGDYTNSFVF NEGASHF KVF FNYGWVF 657 
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<213> OrganismName : Helicobacter pylori J99 
<400> PreSequenc eSt ring : 

MKKTILLSLS LSLASSLLHA EDNGFFVSAG YQIGEAVQMV KNTGELKNLN DKYEQLSQSL 60 

AQLASLKKSI QTANNIQAVN NALSDLKSFA SNNHTNKETS PIYNTAQAVI TSVLAFWSLY 120 

5 AGNALSFHVT GLNDGSNSPL GRIHRDGNCT GLQQCFMSKE TYDKMKTLAE NLQKAQGNLC 180 

ALSECSSNQS NGGKTSMTTA LQTAQQLMDL IEQTKVSMVW KNIVIAGVTN KPNGAGAITS 240 

TGHVTDYAVF NNIKAMLPIL QQALTLSQSN HTLSTQLQAR AMGSQTNREF AKDXYALAQN 3 00 

QKQILSNASS IFNLFNSIPK DQLKYLENAY LKVPHLGKTP TNPYRQNWL NKEINAVQDN 3 60 

VANYGNRLDS ALSVAKDVYN LKSNQTEIVT TYNDAKNLSE EISKLPYNQV NVTNIVMSPK 42 0 

10 DSTAGQYQIN PEQQSNLNQA LAAMSNNPFK KVGMISSQNN NGALNGLGVQ VGYKQ FFGES 48 0 

KRWGLRYYGF FDYNHGYIKS SFFNSSSDIW T YGGGS DLL V NFINDSITRK NNKLSVGLFG 540 

GIQLAGTTWL NSQYMNLTAF NNPYSAKVNA SNFQFLFNLG LRTNLATAKK KDSERSAQHG 60 0 

VELGIKIPTI NTNYYSFLGT KLEYRRLYSV YLNYVFAY 63 8 
<212> Type : PRT < 

15 <211> Length : 638 

SequenceName : SEQ ID 116 
SequenceDescription : 

Sequence 

20 

<213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

MKLKKRKVAA TLLKRLTLPL LFTTGSLGAV TYEVHGDFIN FSKVGFNRSP INPVKGIYPT 60 

ETFVNLTGKL EGSVHLGRGW TVNVGGVLGG QVYDNTRYDR WAKDFTPPSY WDKTS CGTDS 12 0 

25 LSLCMNATKM WQQQGPGGI I DPRGIGYMYM GEWNGLFPNY YPANAYLPGH SRRYEVYKAN 18 0 

LTYDSDRVHM VMGRFDVTEQ EQMDWIYQLF QGFYGTFKLT KNMKFLLFSS WGRGIADGQW 240 

LFPIYREKPW GIHKAGIIYR PTKNLMIHPY VYLIPMVGTL PGAKIEYDTN PEFSGRGIRN 3 00 

KTTFYVLYDY RWNNAEYGRY APARYNTWDP FLDNGKWRGL QGPGGATLYL HHHIDINNYF 3 60 

WGGAYLNIG NPNMNLGTWG NPVALDGIEQ WVGGIYSLGF AGIDNITDAD AFTEYVKGGG 420 

30 KHGKFSWSVY QRFTTAPRAL EYGIGMYLDY QFSKHVKAGL KLVWLEFQIR AGYNP GTGFL 48 0 

GPNGQPLNLN NGLFESSAFA QGPQNMGGIA KSITQDRSHL MTHISYSF 52 8 
<212> Type : PRT 
<211> Length : 528 

SequenceName : SEQ ID 117 

35 SequenceDescription : 

Sequence 



<213> OrganismName : Helicobacter pylori J99 

40 <400> PreSequenceString : 

MKNFSPLYCL KKLKKRHLIA LSLPLLSYAN GFKIQEQSLN GTALGSAYVA GARGADAS FY 60 
NPANMGFTND WGENRSEFEM TTTVINIPAF S FKVPTTNQG LYSVTSLEID KSQQNILGII 12 0 

NTIGLGNILK ALGNTAATNG LSQAINRVQG LMNLTNQKW TLASKPDTQI VNGWTGTTNF 180 
VLPKFFYKTR THNGFTFGGS FTAPSGLGMK WNGKGGEFLH DVFIMMVELA PSMSYTINKR 24 0 

45 FSVGVGLRGL YATGS FNNTV YVPLEGASVL SAEQILNLPN NVFADQVPSN MMTLLGNIGY 3 00 

QPALNCQ KAG GDMSDQSCQE FYNGLKKIMG YSGLIKASAN LYGTTQWQK SNGQGVSGGY 3 60 

RVGS SLRVFD HGMFSWYNS SVTFNMKGGL VAITELGPSL GSVLTKGSLN INVSLPQTLS 420 
LAYAHQF FKD RLRVEGVFER TFWSQGNKFL VTPDFANATY KGLSGTVASL DSETLKKMVG 4 80 

LANFKSVMNM GAGWRDTNTF RLGVTYMGKS LRLMGAIDYD QAPSPQDAIG IPDSNGYTVA 540 

50 FGTKYNFRGF DLGVAGS FT F KSNRSSLYQS PTIGQLRIFS ASLGYRW 587 
<212> Type : PRT 
<211> Length : 587 

SequenceName : SEQ ID 118 
SequenceDescription : 

55 

Sequence 



<213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

60 MAFQVNTNIN AMNAHVQ SAL TQNALKTSLE RLSSGLRINK AADDAS GMTV ADSLRSQASS 60 

LGQAIANTND GMGIIQVADK AMDEQLKILD TVKVKATQAA QDGQTTESRK AIQSDIVRLI 12 0 

QGLDNIGNTT TYNGQALLSG QFTNKEFQVG AYSNQSIKAS IGSTTSDKIG QVRIATGALI 180 

TASGDISLTF KQVDGVNDVT LESVKVSSSA GTG I GVLAE V INKNSNRTGV KAYAS VI TTS 240 

DVAVQSGSLS NLTLNGIHLG NIADIKKNDS DGRLVAAINA VTSETGVEAY TDQKGRLNLR 3 00 

65 SIDGRGIEIK TDSVSNGPSA LTMVNGGQDL TKGSTNYGRL SLTRLDAKSI NWSASDSQH 360 

LGFTAIGFGE SQVAETTVNL RDVTGNFNAN VKSASGANYN AVIASGNQSL GSGVTTLRGA 420 

MWIDIAESA MKMLDKVRSD LGSVQNQMIS TVNNISITQV NVKAAESQIR DVDFAEESAN 480 
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FNKNNILAQS GSYAMSQANT VQQNILRLLT 510 
<212> Type : PRT 
<211> Length : 510 

SequenceName : SEQ ID 119 
5 SequenceDescription : 



Sequence 



<213> OrganismName : Helicobacter pylori J99 

10 <400> PreSequenceString : 

MAGTQAIYES SSAGFLSQVS SIISSTSGVA GPFAGIVAGA MTAAIIPIW GFTNPQMTAI 60 
MTQYNQSIAE AVSVPMKAAN QQYNQLYQGF NDQSMAVGNN ILNISKLTGE FNAQGNTQSA 120 
QISAVNSQIA SILASNTTPK NPSAIEAYAT NQIAVPSVPT TVEMMSGILG MITSAAPKYA 18 0 

LALQEQLRSQ ASNSSMNDTA DSLDSCTALG ALVGSSKVFF SCMQISMTPM SVSMPTVYAK 24 0 

15 YQAVATKALT SGVNPMTTPA CPIGDKVLAV YCYAEKVAEI LREYYIEFVK NNTNLLQNAS 3 00 

QMILNQSGIiA TSTYDTQAIS NISSLYNYNI VANKSFLKSH LTYLDYIKDK LKGQKDSYLT 3 60 

ERVQTKIIVK 3 70 

<212> Type : PRT 
<211> Length : 370 

20 SequenceName : SEQ ID 120 

SequenceDescription : 



Sequence 



25 <213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

MTNEAINQQP QTEAAFNPQQ FINNLQVAFI KVDNWASFD PNQKPIVDKN DRDNRQAFEK 60 

ISQLREEFAN KAIKNPTKKN QYFSSFISKS NDLIDKDNLI DTGSSIKSFQ KFGTQRYQ I F 12 0 

MNWVSHQNDP SKINTQKIRG FMENIIQPPI SDDKEKAEFL RSAKQAFAGI IIGNQIRSDQ 180 

30 KFMGVFDESL KERQEAEKNG EPNGDPTGGD WLDIFLSFVF NKKQSSDLKE TLNQEPVPHV 240 

QPDVATTTTD IQSLPPEARD LLDERGNFSK FTLGDMNMLD VEGVAD ID PN YKFNQLLIHN 3 00 

NALSSVLMGS H2TGI EPEKVS LLYGNNGGPE ARHDWNATVG YKNQRGDNVA TLINVHMKNG 3 60 

SGLVIAGGEK GINNPSFYLY KEDQLTGSQR ALSQEEIQNK VDFME FLAQN NAKLDNLSKK 42 0 

EKEKFQNEIE DFQKDSKAYL DALGNDHIAF VSKKDKKHLA LVAEFGNGEL SYTLKDYGKK 480 

35 ADKALDREAK TTLQGSLKHD GVMFVDYSNF KYTNASKSPD KGVGATNGVS HLEAGFSKVA 540 

VFNLPNLNNL AITSWRQDL EDKLIAKGLS PQEANKLVKD FLSSNKELVG KALNFNKAVA 600 

EAKNTGNYDE VKQAQKDLEK SLKKRERLEK DVAKNLESKS GNKNKMEAKS QANSQKDEIF 660 

ALINKEANRD ARAIAYAQNL KGIKRELSDK LENINKDLKD FSKSFDEFKN GKNKDFSKAE 720 

ETLKALKGSV KDLGINPEWI SKVENLNAAL NEFKNGKNKD FSKVTQAKSD LENSIKDVII 78 0 

40 NQKITDKVDM LNQAVSVAKA TGDFSGVEQA LADLKNFSKE QLAQQAQKNE DFNTGKNSAL 840 

YQSVKNGVNG TLVGNGLSKA EATTLSKNFS DIKKELNAKL GNFNWltnSIlsrNG LENSTEPIYT 900 

QVAKKVKAKI DRLDQIASGL GDVGQAASFL LKRHDKVDDL SKVGLSANHE PIYATIDDLG 960 

GPF PLKRHDK VDDLSKVGLS REQKLTQKID NLNQAVSEAK ASHFDNLDQM IDKLKDSTKK 102 0 

NWNLYVESA KKVPTSLSAK LDNYATNSHT RINSNVKNGT INEKATGMLT QKNSEWLKLV 1080 

45 NDKIVAHNVG SAPLSAYDKI GFNQKMMKDY SDSFKFSTRL SNAVKDIKSG FVQFLTNIFS 1140 

MGSYSLMKAS VEHGVKNTNT KGGFQKS 1167 
<212> Type : PRT 
<211> Length : 1167 

SequenceName : SEQ ID 121 

50 SequenceDescription : 



Secfuence 



<213> OrganismName : Helicobacter pylori J99 

55 <400> PreSequenceString : 

MKTNGHFKDF AWKKCFLGAS WALLVGCSP HIIETNEVAL KLNYHPASEK VQALDEKILL 60 
LRPAFQYSDN IAKEYENKFK NQTTLKVEEI LQNQGYKVIN VDSSDKDDFS FAQ KKEGYLA 120 
VAMNGE I VLR PDPKRTIQKK SEPGLLFSTG LDKMEGVLIP AGFVKVTILE PMSGESLDSF 180 
TMDLSELDIQ EKFLKTTHSS HSGGLVSTMV KGTDNSNDAI KSALNKIFAS IMQEMDKKLT 240 

60 QRNLESYQKD AKELKNKRNR 260 
<212> Type : PRT 
<211> Length : 260 

SequenceName : SEQ ID 122 
SequenceDescription : 

65 

Sequence 
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<213> OrganismName : Mycoplasma pneumoniae 
<400> PreSequenceString : 

MKSKLKLKRY LLFLPLLPLG TLSLANTYLL QDHNTLTPYT PFTTPLNGGL DWRAAHLHP 60 

SYELVDWKRV GDTKLVALVR SALVRVKFQD TTSSDQSNTN QNALS FDTQE SQKALNGSQS 12 0 

GSSDTSGSNS QDFASYVLIF KAAPRATWVF ERKIKLALPY VKQESQGSGD QGSNGKGSLY 180 

KTLQDLLVEQ PVTPYTPNAG LARVNGVAQD TVHFGSGQES SWNSQRSQKG LKNNPGPKAV 240 

TGFKLDKGRA YRKLNESWPV YEPLDSTKEG KGKDESSWKN SEKTTAENDA PLVGMVGSGA 3 00 

AGSASSLQGN GSNSSGLKSL LRSAPVSVPP SSTSNQTLSL SNPAPVGPQA WSQPAGGAT 3 60 

AAVSVNRTAS DTATFSKYLN TAQALHQMGV IVPGLEKWGG NNGTGWASR QDATSTNLPH 42 0 

AAGASQTGLG TGSPREPALT ATS QRAVTW AGPLRAGNS S ETDALPNVIT QLYHTSTAQL 4 80 

AYLNGQIWM GSDRVPSLWY WWGEDQESG KATWWAKTEL NWGTDKQKQF VENQLGFKDD 540 

SNSDSKNSNL KAQGLTQPAY LIAGLDWAD HLVFAAFKAG AVGYDMTTDS SASTYNQALA 60 0 

WSTTAGLDSD GGYKALVENT AGLNGPINGL FTLLDTFAYV TPVSGMKGGS QNNEEVQTTY 660 

FVKSDQKATA KIASLINASP LNSYGDDGVT VFDALGLNFN FKLNEERIiPS RTDQLLVYGI. 720 

VNESELKSAR ENAQSTSDDN SNTKVKWTNT ASHYLPVPYY YSANFPEAGN RRRAE QRNGV 78 0 
KISTLESQAT DGFANSLLNF GTGLKAGVDP APVARGHKPN YSAVLLVRGG WRLNFNPDT 840 
DKLLDSTDKN SEPISFSYTP FGSAESAVDL TTLKDVTYIA ESGLWFYTFD NGEKPTYDGK 900 
QQQVKNRKGY AVITVSRTGI EFNEDANTTT LSQAPAALAV QNGIASSQDD LTGILPLSDE 960 

FSAVITKDQT WTGKVDIYKN TNGLFEKDDQ LSENVKRRDN GLVPIYNEGI VDIWGRVDFA 102 0 

ANSVLQARNL TDKTVDEVIN NPDILQSFFK FTPAFDNQRA MLVGE KTSDT TLTVKPKIEY 108 0 

LDGNFYGEDS KIAGIPLNID FPSRIFAGFA ALPSWVIPVS VGSSVGILLI LLILGLGIGI 1140 

PMYKVRKLQD SSFVDVFKKV DTLTTAVGSV YKKIITQTSV IKKAPSALKA ANNAAPKAPV 1200 

KPAAPTAPRP PVQPPKKA 1218 
<212> Type : PRT 
<211> Length : 1218 

SequenceName : SEQ ID 123 

SequenceDescription : 

Sequence 



<213 > OrganismName : Mycoplasma pneumoniae 
<400> PreSequenceString : 

MHQTKKTALS KSTWILILTA TASLATGLTV VGHFTSTTTT LKRQQFSYTR PDEVALRHTN 60 

AINPRLTPWT YRNTSFSSLP LTGENPGAWA LVRDNSAKGI TAGSGSQQTT YDPTRTEAAL 120 

TASTTFALRR YDLAGRALYD LDFSKLNPQT PTRDQTGQIT FNP FGGFGLS GAAPQQWNEV 18 0 

KNKVPVEVAQ DPSNPYRFAV LLVPRSWYY EQLQRGLGLP QQRTESGQNT STTGAMFGLK 240 

VKNAEADTAK SNEKLQGAEA TGSSTTSGSG QSTQRGGSSG DTKVKAIiKIE VKKKSDSEDN 3 00 

GQLQLEKNDL ANAPIKRSEE SGQSVQLKAD DFGTALSSSG SGGNSNPGSP TPWRPWLATE 360 

QIHKDLPKWS ASILILYDAP YARNRTAIDR VDHLDPKAMT ANYPPSWRTP KWNHHGLWDW 42 0 

KARDVLLQTT GFFNPRRHPE WFDGGQTVAD NEKTGFDVDN SENTKQGFQK EAD SDKS API 480 

ALPFEAYFAW IGNLTWFGQA LLVFGGNGHV TKSAHTAPLS IGVFRVRYNA TGTSATVTGW 54 0 

PYALLFSGMV NKQTDGLKDL PFNNNRWFEY VPRMAVAGAK FVGRELVLAG TITMGDTATV 600 

PRLLYDELES NLNLVAQGQG LLREDLQLFT PYGWANRPDL PIGAWSSSSS SSHNAPYYFH 660 

NNPDWQDRPI QNWDAFIKP WEDKNGKDDA KYIYPYRYSG MWAWQVYNWS NKLTDQPLSA 720 

DFVNENAYQP NSLFAAILNP ELLAALPDKV KYGKENEFAA NEYERFNQKL TVAPTQGTNW 78 0 

SHFSPTLSRF STGFNLVGSV LDQVLDYVPW IGNGYRYGNN HRGVDD I TAP QTSAGSSSGI 840 

STNTSGSRSF LPTFSNIGVG LKANVQATLG GSQTMITGGS PRRTLDQANL QLWTGAGWRN 90 0 

DKASSGQSDE NHTKFTSATG MDQQGQSGTS AGMPDSLKQD NISKSGDSLT TQDGNAI DQQ 960 

EATNYTNLPP NLTPTADWPN ALSFTNKNNA QRAQLFLRGL LGSIPVLVNR SGSDSNKFQA 1020 

TDQKWSYTDL HSDQTKLNLP AYGEVNGLLN PALVETYFGN TRAGGS GS NT TSSPGIGFKI 108 0 

PEQNNDSKAT LITPGLAWTP QDVGNLWSG TTVSFQLGGW LVTFTDFVKP RAGYLGLQLT 1140 

GLDASDATQR ALIWAPRPWA AFRGSWVNRL GRVESVWDLK GVWADQAQSD SQGSTTTATR 12 00 

NALPEHPNAL AFQVSWEAS AYKPNTSSGQ TQSTNSSPYL HLVKPKKVTQ SDKLDDDLKN 12 60 

LLDPNQVRTK LRQSFGTDHS TQPQPQSLKT TTPVFGTSSG NLSSVLSGGG AGGGSSGSGQ 132 0 

SGVDLSPVEK VSGWLVGQLP STSDGNTSST NNLAPNTNTG NDWGVGRLS ESNAAKMNDD 13 80 

VDGIVRTPLA ELLDGEGQTA DTGPQSVKFK SPDQIDFNRL FTHPVTDLFD PVTMLVYDQY 1440 

IPLFIDIPAS VNPKMVRLKV LSFDTNEQSL GLRLEFFKPD QDTQPNNNVQ VNPNNGDFLP 1500 

LLTASSQGPQ TLFSPFNQWP DYVLPLAITV PIWIVLSVT LGLAIGIPMH KNKQALKAGF 1560 

ALSNQKVDVL TKAVGSVFKE IINRTGISQA PKRLKQTSAA KPGAPRPPVP PKPGAPKPPV 1620 

QPPKKPA 1627 
<212> Type : PRT 
<211> Length : 1627 

SequenceName : SEQ ID 124 

SequenceDe script ion : 



Sequence 
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<213> OrganismName : Mycoplasma pneumoniae 
<400> PreSequenceString : 

MGYKLKRWPL VAFTFTGIGL GWLAACSAL NTSNLFPRQN RSKQLIGFTE NNIIKPEAVL 60 

KAALAEDNGT ETILRVNFGE ALKSWYQNNK DRNIATRLTI FSENVEDEHD NLLDQKQQAE 120 

5 PINWPIELQK EYDQWGGSES SWKALKLYDR LIADFQSLIF SNIVANVQLT DGSDQFKPTT 180 

KDNLDSTSNK IKFVNSKPND PNGEFFANLQ AYLFAQWWE ENPLPLTQAF FAYQAP KDGL 240 

DSLYDQAAIG SALQLGYAFP AFREPNNGQS QGKTTFDPTP NSAQNFGDFI KAVFPEQKNG 3 00 

QTQQSNTSSR TGLFDWQTKW NTNGAANKLL VTKSNLRGAF KGVGLATAII DQYEYLVGGS 3 60 

KTSSLPEVKV DSNKSNQNPL DSFFMEGKDA VAIRSIVSRA KIAMTDQTPG FKVNPAFVKV 420 

10 KQSQQNDTFY QNQRKLSGGQ SGDNNSQGKH HYLQDAVRLT SSQAMAAAST GADSSSGTNV 480 

GGSSGGNSVL IPLPRSAALT HTQQQVQQTT STLQTPVYAR GDDGTYALAI DGGDYFLANN 540 

KRDFTKQADI LLYRYLQAKS NNFKENGVEF SLNLLESGSL FQTWAQTGLT AKLYGALVAM 600 

MGSGQGTQVK GSVQGS SRAA SVSVQTTQQN RQQSTDTQES EWKLAKS LL KSSADLAKPF 660 

TDNPTFKKAL TDIQSEYKDY LAAAGKLSEF KKDLGEVSGL jQQAIIDRADK YIQLEKQAQK. 720 

15 SAIGLGQPLP YQRASDGSYP ALEKFFIPED SAADGKVKAS ESGSAALVTL KTTDSQKSTN 780 

TVKQPDIKPT RENNDKKLKQ LTSDVETKAS SLITKWGATP QIGSQFSEIV SLKSKDNKPQ 840 

TNMILALLSD VGIKWTKILN SFKEWFFTNT NDFKNNYDSE KKELKGNEYK DFNDLVKQTL 90 0 

YLRSWQRLTS KEKFGYYKEL GSVKAQAAQS GMVSLSSSAA VANAVASSGM QKSGDQTLLE 960 

LGKKAFESEL EAS S SDGQYK YLRFLSTLMW LVKDGAKNYK RLLQQAITVG TRAFVSWTVS 102 0 

20 YDDTATASAA AAKAQVAVLK TAQATNTQSD NPFNKFVQNP DYVQGSETNW FNDKSTPIKP 1080 

DSLLESESTY NFTAEPFDDK TKSQKRSTGG TTNEKHFFGF NGLTINSPQS VSTASAGLTE 1140 

QIFNNFGQLV TS SDKS GALS QYKDKATLKR LIQNTNSDAE LNAFGEVLHR AVNVDTSNLG 1200 

RFNSSGEPLI SFDNKKKFLV DWDKLDDVY FNKFEGYVGQ TKVKMSDSSS SSQGTKTIRK 1260 

PKPHHSPRTR VSRLWAMSFR LPTRTLTKFL LVEKLIRTVL 1300 

25 <212> Type : PRT 

<211> Length : 1300 

SequenceName : SEQ ID 125 
SequenceDescription : 

30 Sequence 



<213> OrganismName i Mycoplasma pneumoniae 
<40 0> PreSequenceString : 

MKKLLIKPQF WFLTLGGFIS SSVILVACAT PSNSALQTVF KARSNQFFNG EQGSLQNALA 60 
35 TALKDPEANK QFVAAPLLKA LTAWYENNQD KQVTQFFKDT KKSVDEQYNQ AVDKWSASR 120 

NKNLFVQQDL LDSAGGVRNL KSPEWWTAH 150 

<212> Type : PRT 

<211> Length : 150 

SequenceName : SEQ ID 126 
40 SequenceDescription : 



Sequence 



<213> OrganismName : Mycoplasma pneumoniae 

45 <400> PreSequenceString : 

MQQQGETKDQ YNTFGLRLVR NSVGVSVLGL DGFVKFIKGG SGGSNGGSSS AKKIDKEEQK 60 

KFLKFRAFQA KIGTFYNTNF AFSFPLNETL KGW FDKHRGL ILANALVKVT LDTKEKASKA 120 

LVDAFSSYKN WLSEYTPVGL ATTMISFYFD QMKALNNKLL ERVRS LNQNV NQANPTPWLN 180 

GLSAKLPYVN TNGNYEKLNN YFTFLITKVL WPKVGTEDTN" VSEEKSKLKT KTEDVNKIRE 240 

50 KILNNIDSKL KTFVQKLKPT LAPRPAYSNV ILLNINNDKV WSAGANWSLA VLLDPKKVNP 3 00 

LSFMLLKQMF DQNSLFKKAK TLFENIQNKA KTSGSGKSGT TTNDDADALS KVTGNYYYNT 360 

WAKLTDKSIY GNLKDDKFDD LFKLAFDSSI NEKSFNVDYK AVIEHYRFIY TLEWLVDKNL 42 0 

KNFKDLLKAN LKFGEIAFIA YKNTETQNFS NPQGIFGSYF NYENETNAAK SATQIIDPNS 480 

FFYKTTTKPE AKTTQSANTA VMVQNTQMNN QQTNSYGFTG LSTSSGSMLG AATQQAILDQ 540 

55 ITKTSLQQYG SQADLKKIIG ETKNQLLLDR IANQLIALKP NTSGNSGTQK TIAAYFQTDA 600 

VGNPTLDFKA KQKLLLDVLD QYKDFFGNNA QAVQRDSGKS GTGNYLTYTD GSDKITYLQF 660 

SYKDIDGLSL SSSNGTSSKF ASDWAALLL FQAAYKGTQQ LALSSINKPQ LPIGDKRIKT 720 

GIDLLK 726 
<212> Type : PRT 

60 <211> Length : 726 

SequenceName : SEQ ID 127 
SequenceDescription : 



Sequence 

65 

<213> OrganismName : Mycoplasma pneumoniae 
<400> PreSequenceString : 



• 5 
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MKSFLRKPKF WLLLLGGLST SSIILSACAT PSNSALQAVF KPTSNQFFNG EHGTIQSALN 60 

TALRDPETNK KFVAAPLLKA LEAWYENNQD KNITQFLKDT KTNVDNQYKT WDKWSAPR 12 0 

WKSLFVQQDL LDSSGGSEAT WKARKLFEQL ISDFASRVFQ KNYLSYKENG KVSAGPFLYD 180 

TISKNSNWQN IVFDAVNFPE TNDDFFAKIQ SEVFDQWAEY TDPTIISSVT LKYSAPN 23 7 

<212> Type : PRT 
<211> Length : 23 7 

SequenceName : SEQ ID 128 

SequenceDescription : 

Sequence 



<213> OrganismName : Mycoplasma pneumoniae 

<40Q> PreSequenceString : 

15 MINFLFNQMN ALNNKFLERA KALNQNVNQA NPTPWLNGLS AKLPYVRTNG NYEKDNNYFT 60 

FLIVKYMWKK VGNEDASLSK DSSINKLKTK TEDVNKIRDK ILEDIQKKVQ EFVKNKLKPT 120 

LAPRQTYSNV ILBNVNNDKV WSMGANWALA NLLDTSKINP LSFMLLKQTF DQNDLFKKAK 18 0 

KLFEDIQSKT NGGSSGGMQG SNTSSSEGAD ALSKVIGNYY YNSWAKLTDK SIYGNPKDNK 240 

FDDLFKLAFE DSINEKSFNV DYKAVIEHYR FIYTLEWLVN GNLKNFKDLL KANLKFGEIA 3 00 

20 FIAYKNTETK EFSNPQGVFG SAFNYENETN EVKIAAQNLD PNNFFYKTTT KPEEVKTAQM 360 

GASMMVMQQK MQSTMQDSNH YGFTGLNTST SSMLGAATQQ AILDQITKNS LQQYGSQQEL 42 0 

KTLIEKTNNQ LLLDRIASQL SGLNPSTTGN SNNGKGKNIA TYFQLDAIGN PTLSFQQKRK 480 

LLLDVLDQYK DFFGTNTQAA QRDSGKGGHG SYSTYQDGSD KITYLQFSYK DIDNLSLSDK 540 

GNSKLASDW AALLLFQAAD KGTQQLALSA IN 572 

25 <212> Type : PRT 

<211> Length : 572 

SequenceName : SEQ ID 129 
SequenceDescription : 

30 Sequence 



<213> OrganismName : Mycoplasma pneumoniae 
<400> PreSequenceString : 

MKKFLRKPQF WKLTLGGFLS TSVILAACAT PSNSALQTVF KARSSQFFNG EQGSLQSALT 60 

35 TALKNPVANK QFIAAPLLKA L EAW YENNED KKITQFLKDT KSNVDSQYTT AVDKWSASR 120 

NKSLFVQQDIi LDNAGGS EAT WKAQKLLEQL ISDFASRVFQ KNYLNYKKDG QVSTGPFTYD 18 0 

ELHKEESWKN FEFSAPRFSE TNDDFFAKIQ SQVFDQWVEY TDPTLISQVN YKYSAPSQGL 240 

GQIYNREKLK DKLTPSYAFP FFAEEKDIAP NQNVGNKRWK QLVKGEGAIT DNNIGQSGTN 3 00 

SQKTGLLKYR NESNKGDFLD FPLNLSDTNE TKQLVDASNI VDQIiEAANLG AALNLKLQVF 3 60 

40 EQDNDELPQI KELKEDLNNT IWDKSKDVE KASKTNALFY NDQEGKQQQS DSDP I AGALD 420 

DIFAQNTSEG TNLSKLAEQV KKAAATKMEA KTAVLRTNNS KGQQNNYWL DAAIPTFNST 480 

TSKSKNNSAS NEVLVALKSG SINLRQVQQT DQNSYSPIKF RIVRNSTGVT VFGLDGGSYY 540 

LKQDSTNKKS VSKQSLTLLT KSSSGNSNKV LRDLDKQKQF LKFRAFQAKT NTFYSTNFAF 600 

SFPLNETLKS WFDKHRELIL ANALVNASLD QKDKASKALT EAFNPYKELI KEFAPVALAT 660 

45 TMISFYFDQM KALNNKLLER ARNLNQNVNQ ANPTPWLNGL SAKLPYVNTN GNYEKLNNYF 720 

TFLITKTLWP KVGQEETSIS EESNKLKTKT ADVDKI RDKI LENIQTKVND FVKNKLKPAL 78 0 

APRPAYSNVI LLNVNNDKVL SSGANWSLAS LLQSDKVNPL SFMLLKQAFD NNDLFKKAQK 840 

LFKDIQEKSS NNGGMQSSST TNSDADALSK VIGNYYYTTW AKLTDKS I YG NPKDNKFDEL 900 

FKLAFEASID EKSFNVDYKA VIDHYRFIYT LQWLVDQKLK NFKSLLKTNL KFGEVAF I A Y 9 60 

50 KNTETTNFSN PQGVFGSYFN YENS AS EVKE STQTLDPNNF FYKTTTKPTV QAIQQVASLA 1020 

LVQKQQMQQN STDHYGFTGL STSTSSMFDA SSRDAILQQI TKTSLQQYGS KDQLKKIIQG 10 80 

TNNQLLLDRI AVQLSGLNPS TTNGGSGKTI ATYFQVDAVG NPTLDFQAKR KLLLDLLDQY 1140 

QNYFGNGAQK SQRDSTPSGT GNYLTYQNGS DKYTYTQFTY QDIDSLSLTT TSGTNNKIAS 1200 

DWAALLLFQ AADKGTQ QLA LSAINKPQLN IGDKRIESGL KLLK 1244 

55 <212> Type : PRT 

<211> Length : 1244 

SequenceName : SEQ ID 130 
SequenceDescription : 

60 Sequence 



<213> OrganismName : Mycoplasma pneumoniae 
<400> PreSequenceString : 

MVGSGAAGSA SSLQGNGSNS SGLKSLLRSA PVSVPPSSTS NQTLSLSNPA PVGPQAWSQ 60 

65 PAGGATAAVS VNRTASDTAT FSKYLNTAQA LHQMGVIVPG LEKWGGNNGT GWASRRDAT 120 

STNLPHAAGA SQTGLGTGSP REPALTATSQ RAVTWAGPL RAGNSSETDA LPNVITQLYH 180 

TSTAQLAYLN GQIWMSSAR VPSLWYWWG EDQESGKATW WAKTELNWGT DKQKQFVENQ 240 
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LGFKDDSNSD SKNSNLKTQG LTQPAYL I AG LDWADHLVF AAFKAGAVGY DMTTDSNAST 3 00 

YNQALVWSTT AGLDSDGGTR LW 3 22 

<212> Type : PRT 
<211> Length : 322 

SequenceName : SEQ ID 131 

SequenceDescription : 

Sequence 



10 <213> Organ ismName : Mycoplasma pneumoniae 
<400> PreSequenceString : 

MPVFLKLTHT IRKVLRVARL SRLAIiLSLTA VIFSGCANIN LISAVGSSSV QPLLSKLSSH 60 
YVLNHNDKDN LVEISVQAGG SSAGVKAITK GLADIGNVSK NTKSYAEENK QLWMDKKLKT 120 

JTIiGKDAIAV IYKAPSEFKG KLVLTKDNLN DLYDLFAGSK SVDINKFVEN GQTXKNSNHN . ^-18.0 

15 LIGFPRTGGA FASGTAEAFL KFSGLTQTKT LDKDSKEILE GQRNYGPNAR PTSETNIEAF 240 
NT FVTTLRQ P NLYGMVYLSL GFVNNNMNLI KSEGFEVLKV KYDNNAVTPS SQAVSSNTYK 3 00 

WVRPLNSWS LLPKQKTLPS IQRFFNWLLF SNNSEIKKIY DDFGVLELTA DEKKKMFKTG 3 60 

NAEMSNIANF WVDDYSLNNQ TFGAL 385 
<212> Type : PRT 
20 <211> Length : 3 85 

SequenceName : SEQ ID 132 

SequenceDescription : 



Sequence 



<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<4 00> PreSequenceString : 

MSFAVLPPEI NSARLYVGAG LAPMLDAAAA WDGLADELGS AAASFSAVTA GLAGS S WLGA 60 

AS TAMTGAAA PYLGWLSAAA AQAQQAATQT RLAAAAFEAA LAATVHPAII SANRALFVSL 12 0 

30 WSNLLGQNA PAIAATEAAY EQMWAQDVAA MFGYHAGASA AVSALTPFGQ ALPTVAGGGA 180 

LVSAAAAQVT TRVFRNLGLA NVGEGNVGNG NVGNFNLGSA NIGNGNIGSG NIGSSNIGFG 240 

NVGPGLTAAL NNIGFGNTGS NNI GFGNTGS NNI GFGNTGD GNRGIGLTGS GLL GFGGLNS 3 00 

GTGNIGLFNS GTGNVGIGNS GTGNWGI GNS GNS YNTGFGN SGDANTGFFN SGIANTGVGN 360 

AGNYNTGSYN PGNSNTGGFN MGQYNTGYLN SGNYNTGLAN SGNVNTGAFI TGNFNNGFLW 42 0 

35 RGDHQGLIFG SPGFFNSTSA PSSGFFNSGA GSASGFLNSG ANNSGFFNSS SGAIGNSGLA 480 

NAGVLVSGVI NSGNTVSGLF NMSLVAITTP ALISGFFNTG SNMSGFFGGP PVFNLGLANR 540 

GWNILGNAN IGNYNILGSG NVGDFNILGS GNLGSQNILG SGNVGSFNIG SGNIGVFNVG 60 0 

SGSLGNYNIG SGNLGIYNIG FGNVGDYNVG FGNAGDFNQG FANTGNNNIG FANTGNNNIG 660 

IGLSGDNQQG FNIASGWNSG TGNSGLFNSG TNNVGIFNAG TGNVGIANSG TGNWGIGNPG 720 

40 TDNTGI LMAG SYNTGILNAG DFNTGFYNTG SYNTGGFNVG NTNTGNFNVG DTNTGSYNPG 78 0 

DTNTGFFNPG NVNTGAFDTG DFNNGFLVAG DNQGQIAIDL SVTTPFIPIN EQMVIDVHNV 84 0 

MTFGGNMITV TEASTVFPQT FYLSGLFFFG PVNLSASTLT VPTITLTIGG PTVTVPISIV 90 0 

GALESRTITF LKIDPAPGIG NSTTNPSSGF FNSGTGGTSG FQNVGGGSSG VWNSGLSSAI 960 

GNSGFQNLGS LQSGWANLGN SVSGFFNTST VNLSTPANVS GLNNIGTNLS GVFRGPTGTI 102 0 

45 FNAGLANLGQ LNIGSANLGD FNLGS GNVGS FNVFSGNQGS YNIGPANLGN YNIGFANLGN 1080 

YNIGFGNAGD FNQGFANTGN NNIGFANTGN NNIGIGLSGD NQQGFNFAGG WNSGTANIGL 1140 

FNSGTNNVGI GNSGTGNWGI GNSGSGNTGI GNTGSTNTGF FNTGIVNTGV ANAGS YNTGW 1200 

YNTGDTNTGI ANLGDFNTGF YNTGNFSTGF ANQGD I ATGA F I TGDMGNGA FWRGDQQGLF 12 60 

SAGYRVHVPE IPAHVTVEVP VNIPITASFT NTVYSGITLE QINFGFTIDI AGIPLLAGAI 1320 

50 SKAVLPPITG TGPAITVNIG DPGGSTAIRI PATASVGPFD VTFVNIAATT GFFNATTDPS 13 8 0 

SGFFNGGPGT VSGIANIGAN I SGFQNVANS ATSGFNNYGS LQSGLANLGD TVS GVFNTGI 1440 

GAPANVSGMF NIGSNLAGFF HDQATGMSMF NLGLGNIGQF NVGFSNVGDS NAGLANIGSF 15 0 0 

NLGSGNLGSF NVFGGNQGSY NIGPANLGNY NIGLGNLGSY NFGFGNAGDF NLGFANTGNN 1560 

NIGFANTGNN NIGIGLSGDN QQGFNFAGGW NSGSGNSGLF NSGTNNIGLF NSGTGNIGIG 1620 

55 NS GTGNWGI A NTGDTNTGIF NTGDVNTGLL NAGNVNTGIF NTGHYNTGSF NAGS FNTAGF 1680 

NPGSYNTGYL NTGS YNTGL A NSGDVNTGGF ITGNYSNGFW WRGDYQGLAG ISQTITVPDT 1740 

AVPVKLHVPI FLDIPVTGTL GTFTVHGFRF PEITGDIFLI GIPFNAATLD AFSFPNISIV 180 0 

LPNIGINLGS GPD PL I D I AG TGGLLPIKIP LIDIPAAPGF GNSTTTPSSG FFNAGTGTVS 18 60 

GVGNVGSNSS GFFNLTSGSS GISGVQNFGE LISGGFNFGN TVS GL VNA S T LGLSMPANLS 1920 

60 GGGNVGATVA GFVNNTQ I LN LGFGNVGSGN VGHGNIGDSN VGLGNLGNAN VGHGNIGSFN 1980 

VFSGNRGSYN IGPANLGNYN IGLGNLGSYN FGFGNAGDFN LGFANSGSNN IGFANTGNNN 204 0 

IGIGLSGHNQ QGFGSWNSGT ANTGLFNSGT NNIGLFNSGT GNIGIGNSGI GNTGIGNPGV 2100 

GNTGLGNSGT GNWGLWNPGT GNMGVANVGT YNTGGYNVGS TNTGIANVGI ANTGS YNTGS 2160 

TNTGSFNDGD FNTGFYNTGD YNTGFYNTGD VNTGAF I GGN FSNGAFWQSD HQ GQWGAHYA 222 0 

65 ITVPQIPLLN FSLNIPVNIP IHLDFGTLAV NGFQIPAITL RALGVTHF S V GPIIVPRIAG 22 80 

TLPVIDINIG DPGGSSSIPI TITSGAGPW IPLLDIPPAP GFGNSTTGPS SGFFNSGTGS 2340 

SSGFGNVGAN NSGFWNTAFA GIGNSGLQNF GSLQSGWANL GNTVSGFYNT SAADFATPAN 24 00 
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LSGLSNVGAD LTGVLRGPNG S TFNAGLANL 
GFGNIGNANI GGANIGDFNV GIANTGPGLT 
TGNNNIGIGL SGDNQIGFGP LNAGIANMGL 
FNTGNNNVGI WLTGDGLSGF SSLNSGAGNT 
5 GIGNMGTGGF GVGLSGDSQV GIGGTNSGSF 
NTGIGNSGNY NTGLLNAGLV NTGIANPGNH 
NTGMANAGDY GTGAFITGSM NNGLLWRADR 
GDITNVSIPA ITFPRIDASG SVDIGILSGT 
PAINLNIGKP DGSTVINIVG GAGAGPISIP 

10 GLLNFGNNSG LYNFATSSMG NSGFQNYGSL 
IGTNLAGWLQ NGPTETTFSV GLANLGFWNL 
GSANIGDFNL GSANIGSSNI GFGNVGPGLT 
TGNGNIGIGL TGDTMTGFGG WNSGTGNIGL 
GNTGSTNSGF FNTGLVNTGI GNSGDYNTGL ' 

15 FNPGNSNTGI ANSGDVNTGA FNSGNYSNGF 



GQFNVGSANIi GSANLGSANL GSANLGNSNV 2460 

AAVNNIGIGN TGNYNIGVGN TGNYNIGFGN 2520 

FNLGDNNFGM ANAGNFNQGI ANTGNNNIGL 258 0 

GFFNSGTANT GLFNSGTGNT GLFNS GTGNV 2640 

NIGLFNSGTG NVGIGNSGTG NVGIGNTGTG 27 00 

NTGLFNIGTF NTGIANPGHY NTGSYNTGSY 2760 

QGLLAANYTI TIERPAAFLM VDIPVNIPIT 282 0 

VLAPVGPITL HGGDASAPLD TPIEIDFGPS 28 80 

IIDLRPAPGF FNATTGPSSG FLNWGAGSAS 2940 

QSGWANLGNS ISGIYNTGLG APANVSGLLN 30 00 

GSANIGNYNL GSANIGVYNL GSANIGDFNL 3 0 60 

AAIGNIGFGN TGNGNIGIGN TGTGNIGFGN 3X2 0 

FNSGTGNIGF GNSGTGNWGI GNSGDYNTGI 3X8 0 

FNAGNTNTGS FNPGDYNTGG FNPGNYNTGY ...324.0 

FWRGDYQGLG GFAYQSAVSE IPWSYDRFQH 33 00 



<2X2> Type : PRT 
<211> Length : 3300 

SequenceNarae : SEQ ID X33 
20 SequenceDe script ion : 



Sequence 

<2X3> OrganismName : Mycobacterium tuberculosis H3 7Rv 

25 <40 0> PreSequenceString : 

MNLVSTTSGM SGFLNVGALG SGVANVGNTI SGIYNVGTSD LSTPAVNSGL ANIGTNIAGL 60 

LRDGAGTAAI NLGLANHGNL NVGFAS LGGF NFGGATIGHN NVGIGNTGIF DVGLANLGSY X2 0 

NIGFGNLGDD NXiGFGNFGSY NIGFGNVGND NLGFANAGGG NIGFANTGSN NVGFGNTGSN X80 

NVGIGLTGNG QIGFGSFNSG SGNIGLFNSG SNNIGFFNSG SGNFGIANSG SFNTGIGNTG 240 

30 NTNTGLFNSG DVNTGAFNPG SFNTGSFNTG SFNTGGFNPG NTNTGYLNIG NYNTGIANTG 3 00 

DVDTGAF I TG NYSNGLFLSG DYQGLVGLNL VIDMPLPISL GVNIPIDIPT TASAGNITLM 3 60 

GVTIPPTGDI VLSSIAGQRA HFGPITXPNX TWGPTTTVA I GGPNTAI T X TGGGAI RI PL 42 0 

ISXPAAPGFG NSTTNPSSGF FNTGAGGASG FGNFGGANSG FWNLASATSG ASGLLNVGAL 480 

GSGLANVGTT VSGFYNTSTS DLATPAFNSG LANISTSIAG LLRDSTGTMV LNLGLANHGT 540 

35 LNVGIANLGD YNIGFANLGS ANFGSANIGG NNIGGANTGX FDIGLANLGS YNI GFGNFGD 60 0 

DNLGFGNLGS YNVGFGNLGN DNLGFANTGS NNIGFANTGS NNXGIGLTGD GQIGFGSLNS 660 

GSGNIGLFNS GSGNIGFFNS GNGNVGIGNT GTANFGLGNT GSTNTGFFNS GDWTGIGNT 72 0" 

GSFNTGSFNP GDSNTGDFNP GSYNTGLGNT GDVDTGAF X S GSYSNGFLWS GMYQGL I GLH 780 

AALAXPEIAL TFGVDIPIHI PINXDAGWT LQGFS IVAAE NNIDFTPIII PTINITLPTA 840 

40 AITVGGPTTS IGITASAGIG SITXPIIDIP ATSGFGNSTT SPSSGFFNSG AGSASGFLNV 90 0 

VAGASGISGY LNVGALGSGV TNVGHTVSGF YNASALDLVT PAFAS GLMRD GMGTMTLNLG 960 

LANLGSNNAG F GNTGI FDVG VANLGNYNIG FGNFGDDNLG FANLGSYNIG VANTGSNNIG 102 0 

FANTGSNNIG IGLTGTGQIG IGALNSGSGN IGLFNSGDGN IGFFNSGTGN FGIGNTGTGN 1080 

FGIGNSGSTS TGLFNSGDGN TGGFNPGNFN TGNFNTGSFN TGGFMAGNTN TGHFNTGNYN 1140 

45 TG I ANTGDVS TGAF I SGNYS NGILWRGDYQ GLIGYSYALT IPEIPAHLDV NIPIDIPITG 120 0 

SFTDLWDNF TIPIIGFESF AFSFHIHTEP DIGPIIVPSF VLSVPTFAIA VGGPTTAINI 1260 

SATAGLGPIT IPIIDIPAAP GIGNSTTSPS SGFFNTGAGT ASGFGNVGGN TSGLWNLASA 1320 

ASGVSGLLNV GALGSGVANV GNTISGIYNT SPLDLGTPAF GSGLANIAGL LQGGAGTTIL 13 80 

DLAGLGNLNV GLANLGGSNF GIGNTGIFNV GFANVGNHNI GLANLGNYSV GFANS GNYHI 1440 

50 GIANTGSANX GFANTGSGNI GIGLTGTGQX GFGSFNSGSH NIGLFNSGDG NVGFFNSGTG 15 00 

NVGIGNTGTA NFGIANSGSF NTGLGNTGST NTGLFNPGNV NTGVGNTGSI NTGSINTGSF 1560 

NTGSTNTGSF NLGDHNTGSF NSGDYNTGYF NAGDYNTGVA NTGNVNTGAF ISGNYSNGFF 1620 

WRGDYQGLIG LSTTITIPEI PYRYDLSVPI DIPITGTWA TTPNSFTIPG FQIRVLLGPA 16 8 0 

AVLVNEMIGP ITIDVNQVIA IDSPIQQTIS MVGTGGFGPI PIGISIGGTP GFGNSTTGPS 1740 

55 SGFFHTGAGH VSGFGNFGAG NMSGSGNFGA GNSGFFNAGG LGNSGLLNFG ALQSGLANLG 18 00 

NTISGVYNTS TLDLATPAFG SGIANIGANL AGLFLDNTGN LTLNFGVANQ GGLNAGI GNL 1860 

GSVNIGFVNT GDSNLGIGNL GDLNFGGVNI GGNNI GIANT GIFDIGLANL GSYNIGLANL 192 0 

GDDNLGFGNA GSYNIGFANF GSDNLGFANT GSYNIGFANT GNNNI GVGLT GNGQIGIGSL 1980 

NSGSNNIGLF NSGSGNIGFF NSGTGNVGIF NTGTGNFGLA NSGGFNTGIG NAGS TNTGVF 2040 

60 NPGDLNTGSF NPGSFNTGGF NPGSGNTGYL NTGDYNTGVA NTGDVDTGAF ITGSYSNGFL 2100 

VSGDYQGLIG LPLLGIPVTP GYFNLTGGPS SGFFNSGAGS VS GFVNS GAG LSGYLNTGAL 2160 

GSGVANVGNT ISGWLNASAL DLATPGFLSG I GNFGTNLAG FFRG 22 04 
<2X2> Type : PRT 
<211> Length : 2204 

65 SequenceNarae : SEQ ID 134 

SequenceDescription : 
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<213> OrganisraName : Mycobacterium tuberculosis H37Rv 
<40 0> PreSequenceString : 

5 MSFVLIAPEF VTAAAGDLTN LGSSISAANA SAASATTQVL AAGADEVSAR IAALFGGFGL 60 

EYQAISAQVA AYHQRFVQAL STGAGAYASA EAAAAEQIVL GVTNAPTQAL LGRPLIGDGA 12 0 

NATTPGGAGG AGGLLFGNGG AGAAGAP GQA GGPGGPAGLW GNGGP GGAGG SGGGTGGAGG 180 

AGGWLFGVGG AGGVGGAGGG TGGAGGPGGL IWGGGGAGGV GGAGGGTGGA GGRAELLFGA 240 

GGAGGAGTDG GPGATGGTGG HGGVGGDGGW LAP GGAGGAG GQGGAGGAGS DGGALGGTGG 3 00 

10 TGGTGGAGGA GGRGALLLGA GGQGGLGGAG GQGGTGGAGG DGVLGGVGGT GGKGGVGGVA 360 

GL GGAGGAAG QLFSAGGAAG AVGVGGTGGQ GGAGGAGAAG ADAPASTGLT GGTGFAGGAG 42 0 

GVGGQGGNAI AGGINGSGGA GGTGGQGGAG GMGGS GADNA SGXGADGGAG GTGGNAGAGG 480 

AGGAAGTGGT GGWGAAGKA G I GGTGGQGG AGGAGSAGTD ATATGATGGT GFS GGAGGAG 540 

* GAGGNTGVGG TNGSGGQGGT- GGAGGAGGAG GVGADNPTG I GGTGGTGGKG GAGGAGGQGG .600, 

15 S S GAGGTNGS GGAGGTGGQG GAGGAGGAGA DNPTGIGGAG GTGGTGGAAG AGGAGGAIGT 660 

GGTGGAVGSV GNAGIGGTGG TGGVGGAGGA GAAAAAGS S A TGGAGFAGGA GGEGGAGGNS 72 0 

GVGGTNGSGG AGGAGGKGGT GGAGGSGADN PTGAGFAGGA GGTGGAAGAG GAGGATGTGG 7 80 

TGGWGATGS AGIGGAGGRG GDGGDGASGL GLGLSGFDGG QGGQGGAGGS AGAGGINGAG" 840 

GAGGNGGDGG DGATGAAGLG DNGGVGGDGG AGGAAGNGGN AGVGLTAKAG DGGAAGNGGN 900 

20 GGAGGAGGAG DNNFNGGQGG AGGQGGQGGL GGASTTSINA NGGAGGNGGT GGKGGAGGAG 960 

TLGVGGS GGT GGDGGDAGSG GGGGFGGAAG KAGGGGNGGR GGDGGDGASG LGLGLSGFDG 1020 

GQGGQGGAGG SAGAGGINGA GGAGGNGGDG GDGATGAAGL GDNGGVGGDG GAGGAAGNGG 10 80 

NAGVGLTAKA GDGGAAGNGG NGGAGGAGGA GDNNFNGGQG GAGGQGGQGG LGGASTTSIN 1140 

ANGGAGGNGG TGGKGGAGGA GTLGVGGSGG TGGDGGDAGS GGGGGFGGAA GKAGGGGNGG 1200 

25 VGGDGGEGAS GLGLGLSGFD GGQGGQGGAG GSAGAGGING AGGAGGTGGA GGDGAPATLI 12 60 

GGPDGGDGGQ GGIGGDGGNA GFGAGVPGDG GDGGNAGFGA GVPGDGGIGG TGGAGGAGGA 1320 

GADGDPS IDG GQGGAGGHGG QGGKGGLNST GLASAASGDG GNGGAGGAGG NGGDGDGFIG 13 80 

GSGGTGGTGG DAGVGGLANT GGTAGNAGIG GAGGRGGDGG AGDSGALSQD GNGFAGGQGG 1440 

QGGVGGNAGA GGINGAGGTG GTGGAGGDGQ NGTTGVAS EG GAGGQGGDGG QGGIGGAGGN 15 0 0 

30 AGFGAGVPGD GGI GGTGGAG GAGGAGADGD PSIDGGQGGA GGHGGQGGKG GLNSTGLASA 1560 

ASGDGGNGGA GGAGGNGGDG DGFIGGSGGT GGTGGDAGVG GLANTGGTAG NAGI GGAGGR 1620 

GGDGGAGDSG ALS QDGNGFA GGQGGQ GGVG GNAGAGGING AGGTGGTGGA GGDGQNGTTG 1680 

VASEGGAGGQ GGDGGQGGIG GAGGNAGFGA GVPGDGGIGG TGGAGGAGGA GADGDPSIDG 1740 

GQGGAGGHGG QGGKGGLNST GLASAASGDG GNGGAGGAGG NGGAGGLGGG GGTGGTNGNG 1800 

35 GLGGGGGNGG AGGAGGTPTG SGTEGTGGDG GDAGAGGNGG SATGVGNGGN GGDGGNGGDG 1860 

GNGAPGGFGG GAGAGGLGGS GAGGGTDGDD GNGGSPGTDG S 1901 
<212> Type : PRT 
<211> Length, r 1901 

SequenceName r SEQ ID 135 . 

40 SequenceDe script ion : 

Sequence 



<213> OrganisraName : Mycobacterium tuberculosis H3 7Rv 

45 <40 0> PreSequenceString : 

MSLVIVAPET VAAAALDVAR IGSSIGAANA AAAGSTTSVL AAGADEVSAA IATLFGSHAR 60 

EYQAISTQVA AFHDRFAQTL SAAVGSYVSA EATNAAPLAT LEHNVLNALN APTQALLGRP 120 

LIGDGAAGAP GTGQAGGAGG I LWGNGGAGG SGAPGQVGGA GGAAGL FGTG GAGGAGGAGA 180 

AGGAGGSGGW LLGNGGVGGA GGQSLLGGAT GGAGGNAGLF GVGGTGGPGG PGGPGGVGGT 240 

50 GGAGGLGGTL YGAGGHGGAG GPGPIGGVGG HGGVGGAAGL LGVGGHGGAG GHGAE GVAGA 3 00 

AGEDLSPHGT SGGVGGDAGD GGTGGRGGWL AGAGGAGGAG GVGGTGGAGG AGFSRALIVA 3 60 

GDNGGDP GAG GAGGTGGAGS T I GAHGAAG A SPTSGGNGGA GGNGAHFSSG GKAGGNGGAG 420 

GAGGLVGNGG AGGAGGNGAP GAPPSGGDPN GGGGGAGGAG GKGGDGGAQA GDGGAGGAGG 48 0 

KGGNGGNGAT GATGLNGLGA GADGTDGGKG GNGGAGGGGG AGGQGGKALA ATHQDGSMGA 540 

55 GGAGGNGGAG GMGGD GGNGA KGTFDNGGDG VGGNGGNGGS RGI GGAGGIG GAGS TAGADG 600 

ARGATPTSGG NGGTGGNGAN ATVAGGAGGA GGKGGNGGLV GNGGAGGKGG DGMAGVAGSS 660 

PTTAGESGTS GQNGGAGGAG GAGGRGGDFG GDGGTGGAGG NGANGANATT PGAKGGDGGH 72 0 

GGPGAQGGNG GQGGPGGLAG NLFGQNGIQG VGGS GGKGGA GGLAGDGGNG ANGNFAFGDG 78 0 

NGGHGGNGGN PGAGGQGGSG GAGS TP GAKG AHGFTPTSGG DGGDGGNGGN SQWGGNGGD 840 

60 GGNGGNGGSA GTGGNGGRGG DGAFGGMSAN ATNPGENGPN GNP GGNGGAG GAGGAGLNGG 900 

NGGAGGNGGL GGFGGNGAAG ANGVAVGAPG QPGGAGGHGG AGGNGGAGGN GGQGWSDGA 960 

GGAGGAGGDG GAP GDGANGG NGQGAGAFAG GGGGRGGDGG NAGNAGAGGP GGTGS TAGKA 1020 

GPAGSILHDG GNGGHGGHGA AS GGNGGP GG HGGNGGNGGT GANGGNGGIG GTGGAGSTGA 1080 

KGVLGTNEGD GGD GGRGGNG GRGGNGGQGL TGAGGNGGTG GTP GNGGNGG NGASGDLVTS 1140 

65 PGDGGGGGRG GDAGRGGDAG LGGSSGPGGT PGDWGTGGTG GTGGTGGQGA NGGLTGGRGG 1200 

TGGNGGNGNT GGTGGAGGTG GTGHNGSQPG MGGNGGAGGF GGNGFAGVGG RGGMGGSGGT 12 60 

GGTGDAGPFG TGTGGTGGHG GQGGGGGFSI LLGLGGLGGL GSPGSIATGT AGGAGGGGGF 132 0 
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45 



GGLGGGEFV 132 9 

<212> Type : PRT 
<211> Length : 1329 

SequenceNarae : SEQ ID 13 6 

SequenceDe script ion : 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 

10 <40 0> PreSequenceString : 

MSYVIATPEM MATAAFDLAR IGSQVSAASA VAAMPTTEW AAGADEVSAG IAALFSAHAQ 60 

EYQALSAQAA AFHDQFVHTL TAAARWYTAT EIANAAAMRV VLGAVNAPTQ TLLGRPLIGD 12 0 

GAHGTAPGQP GGAGGLLFGN" GGNGAAGAVG QVGGAGGAAG LFGIGGAGGA GGAGAPGGTG 18 0 

GTGGWIAGGG GVGGMGGAGG GAGGAGGNAG LF GNGGAGGA GGAGGGAGGA GGNAGWFGHG 24 0. 

15 GAGGVGGVGA AGANGATPGQ DGAAGVAGSD DGAGGDGLAG SDGGDGGAGG VGGNGGRGGW . 3 00 

LLGNGGAGGV GGVGGAGGAG AAGGAGGAGA TGINGPAGIS AAGGDGGAGG NGGAGGNGGV 3 60 

GGAGGAGGSA GLLGYVGRAG DGGAGGGGGL GGAP GDGGAG GNGGSWLAAG DGGAGGHGGD 42 0 

PGLGGAGGAG GASGGAGARA GANGLAAGND G PVS GGMGGK GGNGAHAPVA GGHGGNGGAG 48 0 

GNGGLVGDGG AGGHGGDGAA GAGYADMTAI FLGSSGTPGE DGGNGGAGGA GGAGGAHAGD 540 

20 GGAGGAGGNG GAGGAGGNGA HGFNAVLVSD GGNGGDGGAG GRGGDGGAGG AGGDAPAGRA 600 

GSQGVGGDGG AGGAGGAPGN GGSGGRGDMA FKDGDGGAGG DGGDPGAGGK GGAGGAGATE 660 

GVTGATGATV HSGGNGGKGG NGADATVAGA NGGKGGAGGN GGLVGDGGAG GDGGS GAAGA 72 0 

NGANVGEDGA DGTLSGQPGE GSEANGGQGG VGGGGAGGAG GDGGAGS SAL GSGGNGGRGD 78 0 

AGQAGGAGGA GGAGGAGGSV SGDGGPGGKG GAGGAGGAGA SGGGGGKGAS GADSAEAVGG 84 0 

25 AGGKGGD GGV GGVGGD GGPG GDGGAGGAAP AGQVGSHGVG GVGGDGGLGG AGGNGGD GGH 90 0 

GSDGGDGGDG GDPGAGGLGG LGGDSGNGTR AASGVDASDH GPGSGGNGGN GGNGAQASVA 960 

GGAGGNGGDG GNAGRVGDGG AGGNGGDGAA GANGANS GAP GSDALALGQP GGMGGQGDAG 1020 

QAGGAGGAGG AGGAGGSVSG DGGAGGNGGA GGNGGVGASG GAGARGANGI DS I GGTGGAG 10 8 0 

GGGGDGGAGG VGGHGGDGGV GGAAPSGTVG SHGTGGVGGD GGLGGAGGVG GAGGNGGIGI 1140 

30 TVGGAGGAGG NGGDPGAGGR GGLGGDSGNG TSAANGVDAS KHGPLTGGDG GVGGNGAKAA 12 0 0 

AAGGDGGQGG DGGNAGLFGD GGAGGDGADG TAAEALGGDG GAGGAGGKGG DAGDIGDGGD 1260 

GGKGGDGAHG ALGGLTVAGG NGGAGGAGGA GGAGGAFLGD GGNGGAGGQG GAGRGGSPGG 13 20 

GGGVGGHGGA GGDAGMNGGG GTGGQGGNGA AGGAGWSPDS DLKGFDGFDG GSGGAGGDGG 13 80 

AGGAGGTQTG DGGDGGAGGL GGAGGVGGNG VDGFDINETT GRDGGD GGDG GYGGWGGAGG 1440 

35 NGGAGGSAPA GEVGMRGVGG DGGDGGSGGD AGNGGLGGDG FTYLADFDGE PGGDGGDGGD 1500 

GGWGRP GGQG GFGSTSGAHG KAGFGAPGGD GGD GGNGGHG GDGNGSFADA GDGGPGGNGG 1560 

NGGLGGAGRD GGAPGGD GGD GGTGGS GGFG APPPRSIGGG DGGDGGRGGD GGRGAGGLTS 1620 

GGVGSSGESG GSGNGRGDPG SGGSGGEGGE GGPSISVNVT 1660 
<212> Type : PRT 

40 <211> Length r 1660 

SequenceName : SEQ ID 137 
SequenceDescription : 



Sequence 



<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<40 0> PreSequenceString : 

MSFVLVSPET VAAVATDLKR I GAS LAHENA SAAASTTAW SAAADEVSTA VAALFSQHAQ 60 

GYQAAAAQVA AFHS RFVQAL TAGAGAYAFA EAANASPLQS AMGAVSASAQ TLLSRPLIGN 120 

50 GANATTPGGN GGDGGWLFGS GGNGAP GAAG QSGGNGGSAG LWGNGGAGGA GGS GGAAGGN 180 

GGNGGWLFGA GGTGGI GGTG APGAMGGTGG NGGNGALLIG GGGLGGAGGM GGTGGGTGGT 240 

GGNGGNGALL I GAGGVGGAG GIGGQGTGAG GAAGAGGTGG NGGAGGLFMN GGD GGAGGQ G 3 00 

GDGAAGDAAA SAGGTGGKGG QGGDGGTGGA GGAGPVLFGH GGAGGMGGQG GTGGMGGAGG 3 60 

DGTTVIAAGT GGEGGTGGAA GAGGAAGARG ALTSGGLAGG VGAGGTGGTG GTGGNGADAA 420 

55 AWGFGANGD PGFAGGKGGN GGI GGAAVTG GVAGD GGTGG KGGTGGAGGA GNDAGSTGNP 480 

GGKGGDGGIG GAGGAGGAAG TGNGGHAGNT GDGGDGGTGG NGGNGTGGVN GADNTLNPDT 540 

PGGAGEPGGA GGAGGAGGAA GGPGGTGGTG GNGGNGGNGG NGGNGGNGGN GGNAGNNSTN 600 

APVGGEGGAG GDGGAGGAGG AANGGTAGSQ GTGGVGGDGG AGGNGGGGKA GTGNSGNFGV 660 

DGEAGFSGGA GGNGGVGGAA GANGGTGGSG GNGGDGGAGG I GG AGGNG IP GTGTE PAGGT 720 

60 GAKGGDGGDG GAGGAGGNAG GAGGQGGNAG QGGAGGAGGN AVI PGDGVGK APHGDAGGSG 780 

GDGGKGGQGG S GGTGGS GAP I GGGAGGTGG SGGHAGKGGA GGIGAQGTTI TVPGNGGNAG 840 

DGGNGGNAGA GGNGGSGDFG GNTTSGASGS GGNGGNAGTA GSGGAGGTGG TGLSGGNGGN 900 

GGNGGNGGDG GNGAHGTVGA QF VP ATS LPT PNGGAGGNGG TGSNGGAPGP AGAPGPTTGG 960 

NAGS QGI GGD GGNGGDGGKG GDGADAVNW FMPTEPQAAT GTAGSAGDPT GGNGGPGTPG 1020 

65 SPMVAPPPPT PITQVQQGGD GGAGGTGSTN ANDGTATGGK GGEGGVGSIL GGPGGNGGTG 1080 

GNASATGTNG VANAGNGGKG GDGGQFGAGG NGGAGGSVTD GS AGS T AGNG GNGGNATNGT 1140 

I AGQ PAGGNG SAGGKGGDGG N I AAGATGTA GNGGNGGNGN DGAVNAGTGG SGGNGGNAGG 12 0 0 
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GGANGGDGGA GGAGGAGGRG GKGIDGGFGG DGGNGGSNNG TGAGGNGGNG GTGGVGSVGA 1260 

AGGDGGNGGT GGFAGF GGTA GNGGS GGTGG AGGDGGTGGD GGNGVIAGGG GTGGNGGASG 13 2 0 

AGGAGGTGGF AGNGNAGGNG GTGGAS EDGD NGNAGSGATG GTGGNGGTGG DGGAAGLGGV 13 8 0 

A 1381 
5 <212> Type : PRT 

<211> Length : 13 81 

SequenceName : SEQ ID 138 

SeguenceDescription : 

10 Sequence 



<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<400> PreSequenceString : 

MSFVIATPEM LTTAATDLAK I GSXI TAANT AAAAVAKVLP ASADEVSVAV- AALFGTHAQE CG 

15 YQTVSAQVAT FHDRFVQTLS AAASSYVAAE AVNVEQSLLA AVNAPTQALF GRPLIGNGAD 120 

GSPGTGQAGG PGGILYGNGG NGGSGAPGQR GGAGGAAGLI GNGGNGGAGG VGTTGGAGGH 18 0 

GGAGGWLYGN GGAGGFGGAG AVGGNGGAGG TAGL FGVGGA GGAGGNGIAG VTGTSASTPG 240 

GSGTAGGAGG I GGNGGAGGA GGVLMGNGGN GGAGGEGGPG GAGGAGASGA HATNLGADGQ 3 00 

AGGNGGNGGA GGTGGVGGPG GGHGLLGLGG SHGAGGAGGS GGDGGAPGDG GNGATGTWGH 3 60 

20 NLGAGGTGGN GGNPGAGGAG GAGGASVGGS AHGANGAPGT TSTSGGNGGD GGKGADAISS 42 0 

GQTGANGGRG GDGGQVGNGG AGGAGGRGGA GGLGFGSEAP GRPGGAGGTG GAGGNGGTQA 48 0 

GD GGTGGAGG AGGDGGS GGA GSIGFNASAP GAAGSPGGNG GNGGPGGAGG EGGAGGLALA 540 

ASGQNGSQGA GGDGGAGGNG GTPGNGGHGA AGALGVNGGV GGAGGHGGDP GVGGAGGQGG 60 0 

SGSTPGANGA PGNTPTSGGN GGNGGRGADA TGFGQTGASG GRGGDGGLVG NGGAGGAGGN 660 

25 GSKGLPGLGR LGNPGLDGGT GGNGGAGGSG GAWAGNGGTG GAGGTGGVGG TGGSGSDGVN 72 0 

GSSAGADGHP GGTGGVGGTG GKGGDGGDGG AAPNGVAGSQ GPGGAGGDGG TGGVGGNGGR 780 

G I DGADGATA GARGQDGGAG GAGGKGGRGG TGGPGGAGPA GTTGSQGAGG NGGSGGTGGD 84 0 

PGDGGNGANG SVFTNNGIGG NGGNGGNAGP SGAGGSGGAG STFGATGSSS SIHVNGGNGG 90 0 

NGGNGDHALS GNGAAGGNGG NGGNGSLRGS GGAGGHGGNG GNAS RGMGGD GGTGGAGGNA 960 

30 GQXGNGGAGG NGGDGGTGSD GNPGAITGSG GRGGDGGVGG QGGSVAGDGA DGGRGGAGGT 102 0 

GGTGLRGTTG ATGATGTFDA GADGHGGNGG TGGVGGTGGA GGGGGNGGAG GKALSPTGNN 1080 

GSQGAGGDGG AGGAGGTGGT GGDGGRGAHG TLFSSLAGTG GTGGNGGTGG TGGTGGAGGA 1140 

GGTGSTLGAT GATGAAGRAG NGGVGGS GGL GSAFGP GGTG GMGGAGGTST VSAGGDGGRG 1200 

GFGGDGLDAS SGGNGGDGGH GGDGFRTAGA GGRGGDGGKG ADPGGLFPIP GAGGKGGTGG 1260 

35 TGGTAHLGPL AIXGQSGQPG QFGSPGADGR GGAGGAGGGG GAGGSF 1306 
<212> Type : PRT 
<211> Length : 13 0 6 

SequenceName : SEQ ID 139 
SequenceDescription : 

40 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<4 00> PreSequenceString : 

45 MS AAAVAWD Q LAMELASAAA SFNSVTSGLV GESWLGPSSA AMAAAVAPYL GWLAAAAAQA 60 

QRSATQAAAL VAE F EAVRAA MVQPALVAAN RSDLVSLVFS NFFGQNAPAI AAIEAAYEQM 120 

WAIDVSVMSA YHAGASAVAS ALTPFTAPPQ NLTDLPAQLA AAPAAWTAA ITSSKGVLAN 180 

LSLGLANSGF GQMGAANLGI LNLGSLNPGG NNFGLGNVGS NNVGLGNTGN GNIGFGNTGN 240 

GNI GFGLTGD NQQGFGGWNS GTGN I GLFNS GTGNIGIGNT GTGNFGI GNS GTSYNTGIGN 300 

50 TGQANTGFFN AGIANTGIGN TGNYNTGS FN LGS FNTGDFN TGSSNTGFFN PGNLNTGVGN 360 

TGNVNTGGFN SGNYSNGFFW RGDYQGLIGF SGTLTIPAAG LDLNGLGSVG PI TIPS I TIP 420 

EIGLGINSSG ALVGPINVPP ITVPAIGLGI NSTGALVGPI NIPPITLNSI GLELSAFQVI 480 

NVGSISIPAS PLAIGLFGVN PTVGSIGPGS ISIQLGTPEI PAIPPFFPGF PPDYVTVSGQ 540 

IGPITFLSGG YSLPAIPLGI DVGGGLGPFT VFPDGYSLPA IPLGIDVGGG LGPFTVFPDG 600 

55 YSLPAIPLGI DVGGGLGPFT VFPDGYSLPA IPLGIDVGGA IGPLTTPPIT IPSIPLGIDV 660 

SGSLGPINIP IEIAGTPGFG NSTTTPSSGF FNSGTGGTSG FGNVGSGGSG FWNIAGNLGN 72 0 

SGFLNVGPLT SGILNFGNTV SGLYNTSTLG L AT S AFHSGV GNTDSQLAGF MRNAAGGTLF 78 0 

NFGFANDGTL NLGNANLGDY NVGSGNVGSY NFGSGNIGNG SFGFGNIGSN NFGFGNVGSN 840 

NLGFANTGPG LTEALHNIGF GNI GGNNYGF ANIGNGNIGF GNTGTGNIGI GLTGDNQVGF 900 

60 GALNSGSGNI GFFNSGNGNI GFFNSGNGNV GIGNSGNYNT GLGNVGNANT GLFNTGNVNT 960 

GIGNAGSYNT GSYNAGDTNT GDLNPGNANT GYLNLGDLNT GWGNIGDLNT GALISGSYSN 102 0 

GILWRGDYQG LIGYSDTLSI PAIPLSVEVN GGIGPIWPD ITIPGIPLSL NALGGVGP IV 1080 

VPDITIPGIP LSLNALGGVG PIWPDITIP GIPLSLNALG GVGPIWPDI TIPGIPLSLN 1140 

ALGGVGP I W PDITIPGIPL SLNALGGVGP ITVPGVPISR IPLTINIRIP VNI TLNEL P F 120 0 

65 NVAGIFTGYI GPIPLSTFVL GVTLAGGTLE SGIQGFSVNP FGLNIPLSGA TNAVTI PGFA 12 60 

INPFGLNVPL SGGTSPVTIP GFAINPFGLN VPLSGGTSPV TIPGFTIPGS PLNLTANGGL 1320 

GPINIPINIT SAPGFGNSTT TPSSGFFNSG DGSASGFGNV GPGISGLWNQ VPNALQGGVS 13 80 
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GIYNVGQLAS GVANLGNTVS GFNNTSTVGH LTAAFNSGVN NIGQMLLGFF SPGAGP 1436 

<212> Type : PRT 
<211> Length. : 1436 

SequenceName : SEQ ID 140 

SequenceDescription : 

Sequence 



10 <213> OrganismName : Mycobacterium tuberculosis H37Rv 
<40 0> PreSequenceString : 

MEFPVLPPEI NSVLMYSGAG SSPLLAAAAA WDGLAEELGS AAVS FGQVTS GLTAGVWQGA 60 

AAAAMAAAAA PYAGWLGSVA AAAEAVAGQA RVWGVFEAA LAATVDPALV AANRARLVAL 120 

AVSNLLGQNT PAIAAAEAEY ELMWAADVAA MAGYHS GAS A AAAALPAFSP PAQALGGGVG 3.80 

15 AFLTALFAS P AKAL S LNAGL GNVGNYNVGL GNVGVFNLGA GNVGGQNLGF GMAGGTNVGF 240, 

GNLGNGNVGF GNSGLGAGLA GLGNIGLGNA GSSNYGFANL GVGNI GFGNT GTNNVGVGLT 3 00 

GNHLTGIGGL NSGTGNIGLF NSGTGNVGFF NSGTGNFGVF NSGNYNTGVG NAGTASTGLF 360 

NAGNFNTGW NVGSYNTGSF NAGDTNTGGF NPGGVNTGWL NTGNTNTGIA NSGNVNTGAF 420 

ISGNFNNGVL WVGDYQGLFG VSAGSSIPAI PIGLVLNGDI GPITIQPIPI LPTIPLSIHQ 480 

20 TVNLGPLWP DIVIPAFGGG IGIPINIGPL TITPITLFAQ QTFVNQLPFP TFSLGKITIP 540 

QIQTFDSNGQ LVSFIGPIVI DTTIPGPTNP QIDLTIRWDT PPITLFPNGI SAPDNPLGLL 60 0 

VSVSISNPGF TIPGFSVPAQ PLPLSIDIEG QIDGFSTPPI TIDRIPLTVG GGVTIGPITI 660 

QGLHIPAAPG VGNTTTAPSS GFFNSGAGGV SGFGNVGAGS SGWWNQAPSA LLGAGSGVGN 720 

VGTLGSGVLN LGSGI SGFYN TSVLPFGTPA AVSGIGNLGQ QLSGVSAAGT TLRSMLAGNL - 780 

25 GLANVGNFNT GFGNVGDVNL GAANIGGHNL GLGNVGDGNL GLGNIGHGNL GFANLGLTAG 840 

AAGVGNVGFG NAGINNYGLA NMGVGNIGFA NTGTGNIGIG LVGDHRTGIG GLNSGIGNIG 900 

LFNSGTGNVG FFNSGTGNFG IGMSGRFNTG I GNS GTASTG LFNAGSFSTG IANTGDYNTG 960 

SFNAGDTNTG GFNPGGINTG WFNTGHANTG LANAGTFGTG AFMTGDYSNG LLWRGGYEGL 1020 

VGVRVGPTIS QFPVTVHAI G GVGPLHVAPV PVPAVHVEIT DATVGLGPFT VPPISIPSLP 1080 

30 IASITGSVDL AANTISPIRA LDPLAGSIGL FLEPFRLSDP FITIDAFQW AGVLFLENII 1140 

VPGLTVSGQI LVTPTPIPLT LNLDTTPWTL FPNGFTIPAQ TPVTVGMEVA NDGFTFFPGG 120 0 

LTF PRASAGV TGLSVGLDAF TLLPDGFTLD TVPATFDGTI LIGDIPIPII DVPAVPGFGN 1260 

TTTAPSSGFF NTGGGGGSGF ANVGAGTSGW WNQGHDVLAG AGS GVANAGT LSSGVLNVGS 1320 

GISGWYNTST LGAGTPAWS GIGNLGQQLS GFLANGTVLN RSPIVNIGWA DVGAFNTGLG 13 80 

35 NVGDLNWGAA NIGAQNLGLG NLGSGNVGFG NIGAGNVGFA NSGPAVGLAG LGNVGLSNAG 1440 

SNNWGLANLG VGNI GLANTG TGNIGIGLVG DYQTGIGGLN SGSGNIGLFN SGTGNVGFFN 1500 

TGTGNFGLFN SGSFNTGIGN SGTGSTGLFN AGNFNTGIAN PGSYNTGSFN VGDTNTGGFN 1560 

PGDINTGWFN TGIMNTGTRN TGALMSGTDS NGMLWRGDHE GLFGLSYGIT IPQFPIRITT 1620 

TGGIGPIVIP DTTILPPLHL QITGDADYSF TVPDIPIPAI HIGINGWTV GFTAPEATLL 1680 

40 SAL KNNGS F I SFGPITLSNI DIPPMDFTLG LPVLGPITGQ LGPIHLEPIV VAGIGVPLEI 1740 

EPIPLDAISL SESIPIRIPV DIPASVIDGI SMSEWPIDA SVDIPAVTIT GTTISAIPLG 1800 

FDIRTSAGPL NIPIIDIPAA PGFGNSTQMP SSGFFNTGAG GGSGIGNLGA GVSGLLNQAG 1860 

AGSLVGTLSG LGNAGTLASG VLNSGTAISG LFNVSTLDAT TPAVISGFSN LGDHMSGVS I 1920 

DGLIAILTFP PAESVFDQII DAAIAELQHL D I GNALALGN VGGVNLGLAN VGEFNLGAGN 1980 

45 VGNINVGAGN LGGSNLGLGN VGTGNLGFGN IGAGNFGFGN AGL TAGAGGL GNVGLGNAGS 2040 

GSWGLANVGV GNI GLANTGT GNIGIGLTGD YRTGIGGLNS GTGNLGLFNS GTGNI GFFNT 2100 

GTGNFGLFMS GSYSTGVGNA GTAS TGLFNA GNFNTGLANA GSYNTGSLNV GSFNTGGVNP 2160 

GTVNTGWFNT GHTNTGLFNT GNVNTGAFNS GSFNNGALWT GDYHGLVGFS FSIDIAGSTL 2220 

LDLNETLNLG PIHIEQIDIP GMSLFDVHEI VEIGPFTIPQ VDVPAIPLEI HESIHMDPIV 22 80 

50 LVPATTIPAQ TRTIPLDIPA SPGSTMTLPL I SMRFEGEDW ILGSTAAIPN FGDPFPAPTQ 2340 

GITIHTGPGP GTTGELKISI PGFEIPQIAT TRFLLDVNIS GGLPAFTLFA GGLTIPTNA1 24 0 0 

PL T I DAS GAL DPITIFPGGY TIDPLPLHLA LNLTVPDSSI PIIDVPPTPG FGNTTATPSS 2460 

GFFNSGAGGV SGFGNVGSNL S GWWNQAAS A LAGSGSGVLN VGTLGSGVLN VGSGVSGIYN 252 0 

TSVLPLGTPA VL S GLGNVGH QLSGVSAAGT ALNQIPILNI GLADVGNFNV GFGNVGDVNL 25 80 

55 GAANLGAQNL GLGNVGTGNL GFANVGHGNI GFGNS GL TAG AAGLGNTGFG NAGSANYGFA 2 64 0 

NQGVRNIGLA NTGTGNIGIG LVGDNLTGIG GLNSGAGNIG LFNSGTGNIG FFNSGTGNFG 2700 

IGNSGSFNTG IGNSGTGSTG LFNAGSFNTG VANAGSYNTG SFNAGDTNTG GFNPGTINTG 2760 

WFNTGHTNTG IANSGNVGTG AFMSGNFSNG LLWRGDHEGL FSLFYSLDVP RITIVDAHLD 2820 

GGFGPWLPP IPVPAVNAHL TGNVAMGAFT IPQIDIPALT PNITGSAAFR IWGSVRIPP 28 80 

60 VSVIVEQIIN AS VGAEMRI D PFEMWTQGTN GLGITFYSFG SADGSPYATG PLVFGAGTSD 2940 

GSHLTISASS GAFTTPQLET GPITLGFQVP GSVNAITLFP GGLTFPATSL LNLDVTAGAG 30 00 

GVDIPAITWP EIAASADGSV YVLASSIPLI NIPPTPGIGN STITPSSGFF NAGAGGGSGF 3 060 

GNFGAGTSGW WNQAHTALAG AGS GFANVGT LHSGVLNLGS GVSGIYNTST LGVGTPALVS 3120 

GLGNVGHQLS GLLSGGSAVN PVTVLNIGLA NVGSHNAGFG NVGEVNLGAA NLGAHNLGFG 3180 

65 NIGAGNLGFG NIGHGNVGVG NSGLTAGVPG LGNVGLGNAG GNNWGLANVG VGNI GLANTG 3240 

TGNIGIGLTG DYQTGIGGLN SGAGNLGLFN SGAGNVGFFN TGTGNFGLFN SGSFNTGVGN 33 00 

SGTGSTGLFN AGS FNTGVAN AGSYNTGSFN VGDTNTGGFN PGSINTGWLN AGNANTGVAN 3360 
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AGNVNTGAFV TGNFSNGILW RGDYQGLAGF AVGYTLPLFP AVGADVSGGI GPITVLPPIH 3420 

IPPIPVGFAA VGGIGPIAIP DISVPSIHLG LDPAVHVGSI TVNPITVRTP PVLVSYSQGA 3480 

VTSTSGPTSE IWVKPSFFPG IRIAPSSGGG ATS TQGAYF V GPISIPSGTV TFPGFTIPLD 3540 

PIDIGLPVSL TIPGFTIPGG TLIPTLPLGL ALSNGIPPVD IPAIVLDRIL LDLHADTTIG 3600 

5 PINVPIAGFG GAPGFGNSTT LPSSGFFNTG AGGGSGFSNT GAGMS GLLNA MSD PLLGS AS 3 660 

GFANFGTQLS GILNRGAGIS GVYNTGALGV VTAAWSGFG NVGQQLSGLL FTGVGP 3 716 

<212> Type : PRT 
<211> Length : 3716 
10 SequenceName : SEQ ID 141 

SequenceDescription : 

Sequence 

15 <213> OrganismName : Mycobacterium tuberculosis H37Rv 
<400> PreSequenceString : 

MNFPVLPPEI NSVLMYSGAG SSPLLAAAAA WDGLAEELGS AAVS FGQVTS GLTAGVWQGA 60 

AAAAMAAAAA PYAGWLGSVA AQAVAVAGQA RAAVAAFEAA LAATVDPAAV AVNRMAMRAL 120 

AMSNLLGQNA AAI AAVEAE Y ELMWAADVAA MAGYHS GAS A AAAALPAFSP PAQALGGGVG 180 

20 AFLNALFAGP AKMLRLNAGL GNVGNYNVGL GNVG I FNLGA ANVGAQNLGA ANAGSGNFGF 24 0 

GNIGNANFGF GNSGLGLPPG MGNI GLGNAG SSNYGLANLG VGNIGFANTG SNNIGIGLTG 3 00 

DNLTGIGGLN SGTGNLGLFN SGTGNIGFFN SGTGNFGVFN SGSYNTGVGN AGTASTGLFN 3 60 

VGGFNTGVAN VGS YNTGS FN AGNTNTGGFN PGNVNTGWLN TGNTNTGIAN SGNVNTGAFI 420 

SGNFSNGVLW RGDYEGLWGL SGGSTIPAIP IGLELNGGVG PITVLPIQIL PTIPLNIHQT 48 0 

25 FSLGPLWPD I VI PAFGGGT AIPISVGPIT ISPITLFPAQ NFNTTFPVGP FFGLGWNIS 540 

GIEIKDLAGN VTLQLGNLNI DTRINQSFPV TVNWSTPAVT IFPNGISIPN NPLALLASAS 600 

IGTLGFTIPG FTIPAAPLPL TIDIDGQIDG FSTPPITIDR IPLNLGASVT VGPILINGVN 660 

IPATPGFGNT TTAPSSGFFN SGDGGVSGFG MFGAGSSGWW NQAQTEVAGA GSGFANFGSL 72 0 

GSGVLNFGSG VSGLYNTGGL PPGTPAWSG IGNVGEQLSG LSSAGTALNQ SLIINLGLAD 78 0 

30 VGSVNVGFGN VGDFNLGAAN IGDLNVGLGN VGGGNVGFGN IGDANFGLGN AGLAAGLAGV 840 

GNIGLGNAGS GNVGFGNMGV GNIGFGNTGT NNLGIGLTGD NQTGIGGLNS GAGNIGLFNS 900 

GTGNVGLFNS GTGNFGLFNS GSFNTGIGNG GTGSTGLFNA GNFNTGVANP GS YNTGS FNV 960 

GDTNTGGFNP GS INTGWFNT GNANTGVANS GNVDTGALMS GNFSNGILWR GNFEGLFGLN 102 0 

VGITIPEFPI HWTSTGGIGP IIIPDTTILP PIHLGLTGQA NYGFAVPDIP IPAIHIDFDG 1080 

35 AADAGFTAPA TTLLSALGIT GQFRFGPITV SNVQLNPFNV NLKLQFLHDA FPNEFPDPTI 1140 

SVQIQVAIPL TSATLGGLAL PLQQTIDAIE LPAISFSQSI PIDIPPIDIP ASTINGISMS 12 0 0 

EWPIDVSVD IPAVTITGTR IDPIPLNFDV LSSAGPINIS IIDIPALPGF GNSTELPSSG 1260 

FFNTGGGGGS GIANFGAGVS GLLNQASSPM VGTLSGLGNA GSLASGVLNS GVDISGMFNV 1320 

STLGSAPAVI SGFGNLGNHV SGVS IDGLLA MLTSGGSGGS GQPSIIDAAI AELRHLNPLN 1380 

40 I VNLGNVGS Y NLGFANVGDV NLGAGNLGNL NLGGGNLGGQ NLGLGNLGDG NVGFGNLGHG 144 0 

NVGFGNSGLG ALPGIGNIGL GNAG SNNVGF GNMGLGNIGF GNTGTNNLGI GLTGDNQTGF 1500 

GGLNSGAGNL GLFNSGTGNI GFFNTGTGNW GLFNSGSYNT GIGNSGTGST GLFNAGSFNT 1560 

GLANAGSYNT GSLNAGNTNT GGFNPGNVNT GWFNAGHTNT GGFNTGNVNT GAFNSGSFNN 1620 

GALWTGDHHG LVGFSYSIEI TGSTLVDINE TLNLGPVHID QIDIPGMSLF DIHELVNIGP 168 0 

45 FRIEPIDVPA WLDIHETMV IPPIVFLPSM TIGGQTYTIP LDTPPAPAPP PFRLPLLFVN 1740 

ALGDNWIVGA SNSTGMSGGF VTAPTQGILI HTGPS SATTG SIiALTLPTVT IPTITTSPIP 18 0 0 

LKIDVSGGLP AFTLFPGGLN IPQNAIPLTI DASGVLDPIT IFPGGFTIDP LPLSLALNIS 1860 

VPDSSVPIII VPPTPGFGNA TATPSSGFFN SGAGGVSGFG NFGAGSSGWW NQAHAALAGA 192 0 

GSGVLNVGTL NSGVLNVGSG I SGLYNTAI V GLGTPALVSG AGNVGQQLSG VLAAGTALTQ 1980 

50 SPIINLGLAD VGNYNLGLGN VGDFNLGAAN LGDLNLGLGN I GNANVGF GN I GHGNVGFGN 2040 

SGLGAALGIG NIGLGNAGST NVGL ANMGV G N I GFANTGTN NLGIGLTGDN QTGIGGLNSG 2100 

AGNIGLFNSG TGNIGFFNSG TGNWGLFNSG SFNTGIGNSG TGSTGLFNAG GFTTGLANAG 2160 

SYNTGSFNVG DTNTGGFNPG SINTGWFNTG NANTGIANSG NVDTGALMSG NFSNGILWRG 2220 

NYEGLFSYSY SLDVPRITIL DAHF TGAFGP VWPPIPVLA INAHLTGNAA MGAFTIPQID 22 80 

55 I P ALNPNVTG SVGFGPIAVP SVTIPALTAA RAVLDMAASV GATSEIEPFI VWTSSGAIGP 23 40 

TWYSVGRIYN AGDLFVGGNI ISGIPTLSTT GPVHAVFNAA SQAFNTPALN IHQIPLGFQV 24 00 

PGSIDAITLF PGGLTFPANS LLNLDVFVGT PGATIPAITF PEIPANADGE LYVIAGDIPL 2460 

INIPPTPGIG NTTTVPSSGF FNTGAGGGSG FGNFGANMSG WWNQAHTALA GAGS G I ANVG 2520 

TLHSGVLNLG SGLSGIYNTS TLPLGTPALV SGLGNVGDHL SGLLASNVGQ NPITIVNIGL 2580 

60 ANVGNGNVGL GNIGNLNLGA ANIGDVNLGF GNIGDVNLGF GNI GGGNVGF GNIGDANFGF 2640 

GNS GIiAAGLA GMGNIGLGNA GSGNVGWANM GLGNIGFGNT GTNNLGI GLT GDNQSGIGGL 2700 

NSGTGNIGLF NSGTGNIGFF NSGTANFGLF NSGSYNTGIG NSGVASTGLV NAGGFNTGVA 2760 

NAGS YNTGS F NAGD TNTGGF NPGSTNTGWF NTGNANTGVA NAGNVNTGAL ITGNFSNGIL 2820 

WRGNYEGLAG FSFGYPIPLF PAVGADVTGD IGPATIIPPI HIPSIPLGFA ' AIGHIGPISI 28 80 

65 PNIAIPSIHL GIDPTFDVGP ITVDPITLTI P GL S LD AAVS EIRMTSGSSS GFKVRPSFSF 2940 

FAVGPDGMPG GEVSILQPFT VAPINLNPTT LHFPGFTIPT GPIHIGLPLS LTIPGFTIPG 30 00 

GTLIPQLPLG LGLSGGTPPF DLPTWIDRI PVELHASTTI GPVSLPIFGF GGAPGFGNDT 3060 
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TAPSSGFFNT GGGGGSGFSN SGSGMSGVLN AISDPLLGSA SGFANFGTQL SGILNRGAGI 312 0 

SGVYNTGTTiG LVTSAFVSGF MNVGQQLSGL LFAGTGP 3157 
<212> Type : PRT 
<211> Length : 3157 

SequenceNarae : SEQ ID 142 

SequenceDe script ion : 

Sequence 



10 <213> OrganismName : Mycobacterium tuberculosis H37Rv 
<400> PreSequenceString : 

MSFWMPPEI NSLLIYTGAG PGPLLAAAAA WDELAAELGS AAAAFGSVTS GLVGGIWQGP 60 

SSVAMAAAAA P YAGWL S AAA ASAESAAGQA RAWGVFEAA LAETVDPFVI AANRSRLVSL 12 0 

, , ALSNLFGQNT PAIAAAEFDY ELMWAQDVAA MLGYHTGASA AAEAliAPFGS PLASLAAAAE . 180 

15 PAKSLAVNLG LANVGLFNAG SGNVGSYNVG AGNVGS YNVG GGNI GGNNVG LGNVGWGNFG 240 

LGNSGLTPGL MGLGNIGFGN AGSYNFGLAN MGVGNIGFAN TGSGNFGIGL TGDNLTGFGG 3 00 

FNTGSGNVGL FNSGTGNVGF FNSGTGNWGV FNSGSYNTGI GNSGIASTGL FNAGGFNTGV 3 60 

VNAGSYNTGS FNAGEANTGG FNPGSVNTGW LNTGDINTGV ANSGDVNTGA FISGNYSNGV 42 0 

LWRGDYQGLL GFSSGANVIiP VIPLSLDING GVGAITIEPI HILPDIPINI NETLYLGPLV 48 0 

20 VPPINVPAIS LGVGIPNISI GPIKINPITL WPAQNFNQTI TLAWPVSSIT IPQIQQVALS 540 
♦'PSPIPTTLIG PIHINTGFSI PVTFSYSTPA LTLFPVGLSI PTGGPLTLTL GVTAGTEAFT 60 0 

IPGFSIPEQP LPLAINVIGH INALSTPAIT IDNIPLNLHA IGGVGPVDIV GGNVPASPGF 660 
GNSTTAPSSG FFNTGAGGVS GFGNVGAHTS GWFNQSTQAM QVLPGTVSGY FNSGTLMSGI 72 0 

GNVGTQLSGM LSGGALGGNN FGLGNIGFDN VGFGNAGSSN FGLANMGIGN IGLANTGNGN 78 0 

25 IGIGLSGDNL TGFGGFNSGS ENVGLFNSGT GNVGFFNSGT GNLGVFNSGS HNTGFFLTGN 840 
NINVtiAPFTP GTLFTISEIP IDLQVIGGIG PIHVQPIDIP AFDIQITGGF IGIREFTLPE 900 
ITIPAIPIHV TGTVGLEGFH VNPAFVLFGQ TAMAE I TAD P WLPDPFITI DHYGP PLGPP 960 

GAKFPSGSFY LSISDLQING PIIGSYGGPG TIPGPFGATF NLSTSSLALF PAGLTVPDQT 1020 

PVTVNLTGGL DSITLFPGGL AFPENPWSL TNFSVGTGGF TVFPQGFTVD RIPVDLHTTL 10 8 0 

30 SIGPFPFRWD YIPPTPANGP I PAVPGGFGL TSGLFPFHFT LNGGIGPISI PTTTWDALN 1140 

PLLTVTGNLE VGPFTVPDIP I PAINFGLD G NVNVSFNAPA TTLLSGLGIT GSIDISGIQI 1200 

TNIQTQPAQL FMSVGQTLFL FDFRDGIELN PIVIPGSSIP ITMAGLSIPL PTVSESIPLN 1260 

FSFGSPASTV KSMILHEII*P IDVSINLEDA VFIPATVLPA IPLNVDVTIP VGPINIPIIT 132 0 

EPGSGNSTTT TSDPFSGLAV PGLGVGLLGL FDGSIANNLI SGFNSAVGIV GPNVGLSNLG 138 0 

35 GGNVGLGNVG DFNLGAGNVG GFNVGGGNIG GNNVGLGNVG FGNVGLANSG LTPGLMGLGN 144 0 

IGFGNAGSYN FGLANMGVGN IGFANTGSGN FGI GLTGDNIi TGFGGFNTGS GNVGLFNSGT 15 0 0 

GNVGFFNSGT GNWGVFNSGS YNTGIGNSGI ASTGLFNAGG FNTGWNAGS YNTGS FNAGQ 1560 

ANTGGFNPGS VNTGWLNTGD INTGVANSGD VNTGAFISGN YSNGAFWRGD YQGLLGFSYR 162 0 

PAVLPQTPFIi DLTLTGGLGS WIPAIDIPA IRPEFSANVA IDSFTVPSIP IPQIDLAATT 1680 

40 VSVGLGPITV PHLDIPRVPV TLNYLFGSQP GGPLKIGPIT GLFNTPIGLT PLALSQIVIG 174 0 

ASSSQGTITA FLANLPFSTP WTIDEIPLL ASITGHSEPV DIFPGGLTIP AMNPLSINLS 18 0 0 

GGTGAVT I PA ITIGEIPFDL VAHSTLGPVH ILIDLPAVPG FGNTTGAPSS GFFNSGAGGV 1860 

SGFGNVGAMV SGGWNQAPSA LLGGGS GVFN AGTLHSGVLN FGSGMSGLFN TSVLGLGAPA 192 0 

LVSGLGSVGQ QLSGLLASGT ALHQGLVLNF GLADVGLGNV GLGNVGDFNL GAGNVGGFNV 1980 

45 GGGNIGGNNV GLGNVGWGNF GLGNSGLTPG LMGLGNIGFG . NAGSYNFGLA NMGVGN I GFA 2040 

NTGSGNFGIG LTGDNLTGFG GFNTGS GNVG LFNSGTGNVG FFNSGTGNWG VFNSGSYNTG 210 0 

IGNSGIASTG LFNAGGFNTG WNAGSYNTG SFNAGQANTG GFNPGSVNTG WLNTGDINTG 2160 

VANS GDVNTG AFISGNYSNG AFWRGDYQGL LGFSYTSTII PEFTVANIHA SGGAGPIIVP 222 0 

SIQFPAIPLD LSATGHIGGF TIPPVSISPI TVRIDPVFDL GPITVQDITI PALGLDPATG 228 0 

50 VTVGPIFSSG SIIDPFSLTL LGFINVNVPA IQTAPSEILP FTVLLSSLGV THLTPEITIP 2340 

GFHIPVDPIH VELPLSVTIG PFVSPEITIP QLPLGLALSG ATPAFAFPLE ITIDRIPWL 2400 

DVNALLGPIN AGLVIPPVPG FGNTTAVPSS GFFNIGGGGG LSGFHNLGAG MS GVLNAI SD 2460 

PLLGSAS GFA NFGTQLSGIL NRGAD I SGVY NTGALGL ITS ALVSGFGNVG QQLAGLIYTG 252 0 
TGP 

55 <212> Type : PRT 

<211> Length : 2523 

SequenceName : SEQ ID 143 
SequenceDescription : 

60 Sequence 



2523 



<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<4 00> PreSequenceString : 

MSFVIAVPEA LTMAASDLAN IGSTINAANA AAALPTTGW AAAADEVSAA VAALFGSYAQ 60 

65 SYQAFGAQLS AFHAQFVQSL TNGARSYWA EATSAAPLQD LLGWNAPAQ ALLGRPLIGN 120 

GANGADGTGA PGGPGGLLLG NGGNGGS GAP GQP GGAGGDA GL I GNGGTGG KGGDGLVGSG 180 

AAGGVGGRGG WLLGNGGTGG AGGAAGATLV GGTGGVGGAT GLIGSGGFGG AGGAAAGVGT 240 
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TGGVGGSGGV GGVFGNGGFG GAGGLGAAGG 
LLIGNGGVGG LGGAGAAGGN GGAGGMLLGD 
FGSGGAGGQG GTGLAGTNGV NPGSIANPNT 
GVGGQGGLGE SLDGNDGTGG KGGAGGTAGT 
5 DGATGGVDGG VGGAGGKGGQ GHNTGVGDAF 
GRGGMLIGNG GAGGAGGTGG TGGGGAAGFA 
VGGTGGMGGS GGVGGNGGAA GSLIGLGGGG 
GGGAT I GGGG GTGGVGGAGG TGGTGGAGGT 
LGGQGGNGGN GGTGATGGQG GDFALGGNGG 

10 

<212> Type : PRT 
<211> Length : 778 

SequenceName : SEQ ID 144 
, SequenceDe script ion, : 

15 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 

<400> PreSequenceString : 
20 PQGADGNAGN GGDGGVGGNG GNGADNTTTA 

QQGNGGNGGN GGTGGKGGTG GDGAIiAGS S G 

GKGGNGG I GA AGTTGPVGTG ASGGTGGSGG 

GGAGVTSSTA GNSGGAGGSG GKGGDAGAGG 

TGAGDGGHGG TGAAGGNGGT GGAGGSGIDG 
25 GGNGGIGGKG GNAGAGGAAG SNGGTVGANG 

GGTGGRGGSG GAGGDGI GGV GGGKGGNGAD 

GGAGGAAGAG GAGGGANGTA GNGGQGGAGG 

GGTGGAGGTG GAAGDGGQGG QGGAGGGAGG 

GGAAGKGGAG GQGGTGGGTG GQGGAGGDGG 
30 AGGQGGADGG SGGDGGDAGT GGNGGNGGNR 

TGGNGGAGGD AGDAGNGGNG NGTGNGGNGG 

NAGMGGNSGT GSGDGGAGGN GGAAGTGGTG 

TANMTAQAGG DGGNGGDGGF GGGAGAGGGG 

DDPGGNGGTG GNGGTGGTGG AGIGSLGGGT 
35 NGGDGGTGGT GGGDGGAGGT GGTGGTGGLG 

AGGNGNGGTG GAGGXGGTGG NGGDAEPGVP 

GGDGGTGGGG GNGGTGWNGG KGDTGS GGGA 

<212> Type : PRT 
40 <211> Length : 1079 

SequenceName : SEQ ID 145 
SequenceDe script ion : 

Sequence 
45 

<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<40 0> PreSequenceString : 

MVMSLMVAPE LVAAAAADLT GIGQAISAAN AAAAGP TTQV LAAAGD EVS A AIAALFGTHA 60 

QEYQALSARV ATFHEQFVRS LTAAGSAYAT AEAANASPLQ ALEQQVLGAI NAPTQLWLGR 12 0 

50 PLIGDGVHGA PGTGQPGGAG GLLWGNGGNG GSGAAGQVGG PGGAAGL FGN GGS GGSGGAG 180 

AAGGVGGSGG WLNGNGGAGG AGGTGANGGA GGNAWLFGAG GSGGAGTNGG VGGSGGFVYG 240 

NGGAGGI GGI GGIGGNGGDA GLFGNGGAGG AGAAGLPGAA GLNGGDGSDG GNGGTGGNGG 3 00 

RGGLLVGNGG AGGAGGVGGD GGKGGAGDPS FAVNNGAGGN GGHGGNPGVG GAGGAGGLLA 3 60 

GAHGAAGATP TSGGNGGDGG I GATANS PL Q AGGAGGNGGH GGLVGNGGTG GAGGAGHAGS 420 

55 TGATGTALQP TGGNGTNGGA GGHGGNGGNG GAQHGDGGVG GKGGAGGSGG AGGNGFDAAT 480 

LGS PGADGGM GGNGGKGGDG GKAGDGGAGA AGDVTLAVNQ GAGGDGGNGG EVGVGGKGGA 540 

GGVSANPALN GSAGANGTAP TSGGNGGNGG AGATPTVAGE NGGAGGNGGH GGSVGNGGAG 600 

GAGGNGVAGT GLALNGGNGG NGGIGGNGGS AAGTGGD GGK GGNGGAGANG QDFSASANGA 660 

NGGQGGNGGN GGI GGKGGDA FATFAKAGNG GAGGNGGNVG VAGQGGAGGK GAI PAMKGAT 720 

60 GADGTAPTSG GDGGNGGNGA SPTVAGGNGG DGGKGGSGGN VGNGGNGGAG GNGAAGQAGT 780 

PGPTSGDSGT SGTDGGAGGN GGAGGAGGTL AGHGGNGGKG GNGGQGGIGG AGERGADGAG 84 0 

PNANGANGEN GGSGGNGGDG GAGGNGGAGG KAQAAGYTDG ATGTGGD GGN GGDGGKAGDG 90 0 

GAGENGLNSG AMLPGGGTVG NPGTGGNGGN GGNAGVGGTG GKAGTGS LTG LDGTDGITPN 960 

GGNGGNGGNG GKGGTAGNGS GAAGGNGGNG GSGLNGGDAG NGGNGGGALN QAGFFGTGGK 1020 

65 GGNGGNGGAG MINGGLGGFG GAGGGGAVDV AATTGGAGGN GGAGGF AS TG LGGPGGAGGP 10 8 0 

GGAGDFASGV GGVGGAGGDG GAGGVGGFGG QGGIGGEGRT GGNGGSGGDG GGGISLGGNG 1140 

GLGGNGGVSE TGFGGAGGNG GYGGPGGPEG NGGLGGNGGA GGNGGVSTTG GDGGAGGKGG 1200 



VGGAASYFGT GGGGGVGGDG APGGDGGAGP 3 00 

GGAGGQGGPA VAGVLGGMPG AGGNGGNANW 360 

GANGTDNSGN GNQTGGNGGP GPAGGVGEAG 42 0 

DGGAGGAGGA GGI GETDGS A GGVATGGEGG 48 0 

GGDGGIGGDG NGALGAAGGN GGTGGAGGNG 54 0 

GGVGGAGGEG LTDGAGTAEG GTGGLGGLGG 60 0 

GAGGVGGTGG IGGIGGAGGN GGAGGAGTTT 660 

TGGSGGAGGL I GW AGAAGGT GAGGTGGQGG 72 0 

AGGAGGS PGG SSGIQGNMGP PGTQGADG 778 



AAGTTGGAGG AGGAGGTGGT GGAAGTGTGG 60 

GAGGKGGNGG DAGKAGTGSA PGTAGTGGDG 120 

AGGTGGDGGA ANGGTAGAGG AGGNGGKGGD 18 0 

AGATPGANGI AGNGGD GGDG AAGAVGI S GA 24 0 

VGGGTGGTGG NGGNGAIGGA GGDAGGSGNS 3 00 

TGGDGGNGGA AGAATAGSNG GAGTGSAGGN 3 60 

GEVGGAGGAG GSGPNTSPGG NGGQGGQGGS 42 0 

TGGAGAAS S A TNGGS GGAGG TGGDGGSGGA 480 

QGGAGGAGGT GGNGGNITGG TAGTAGAAGN , 54 0 

AGGTGGDRTV GGGTVPAGSG GQGGNAGGGG 600 

NSGNGTGGAG GNGGGGANGG AGGAGGS GGG 660 

NGGIAGMGGN GGAGTGSGNG GNGGS GGNGG 720 

GDGGLTGTGG TGGSGGTGGD GGNGGNGADN 78 0 

LTAGANGTGG QGGAGGDGGN GAI GGHGPLT 84 0 

GGDGGNGGNG GTGGEGGEVG GAGGTGGAAG 90 0 

DPRVGGS GGD GGTGGS GGAA GNGGNGGNAG 9 60 

PGAGGAGGAG TTGGKGGTGG NGSGTGSGGT 1020 

GDGGKAPAGG TGGAGGDGGA GGKGGS GGV 1079 
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NGGD GGNVGL GGDAGSGGAG GNGG IGTDAG GAGGAGGAGG NGGSSKSTTT GNAGS GGAGG 12 60 

NGGTGLNGAG GAGGAGGNAG VAGVS FGNAV GGDGGNGGNG GHGGDGTTGG AGGKGGNGSS 132 0 

GAASGSGWN VTAGHGGNGG NGGNGGNGSA GAGGQGGAGG SAGNGGHGGG ATGGD GGNGG 13 8 0 

NGGNS GNS TG VAGLAGGAAG AGGNGGGTSS AAGHGGSGGS GGS GTTGGAG AAGGNGGAGA 1440 
5 GGGSLSTGQS GGPRRQRWCR WQRRRWLGRQ RRRRWCRWQR RCRRQRWRWR CRQRRLRRQW 1500 
RQGRRRCRPW LHRRRGRQGR RWRQRRFQQR QRSRWQRR 153 8 

<212> Type : PRT 
<211> Length : 153 8 

SequenceName : SEQ ID 146 
10 SequenceDe script ion : 

Sequence 



<213> OrganismName ; Mycobacterium tuberculosis H37Rv^ « .... • 

15 c400> PreSequenceString : 

MSFWTAPPV LASAASDLGG IASMISEANA MAAVRTTALA PAAADEVSAA IAALFSSYAR 60 

DYQTLSVQVT AFHVQFAQTL TNAGQLYAW DVGNGVLLKT EQQVLGVINA PTQTLVGRPL 12 0 

IGDGTHGAPG TGQNGGAGGI LWGNGGNGGS GAPGQPGGRG GDAGLFGHGG HGGVGGP GI A 180 

GAAGTAGLPG GNGANGGSGG I GGAGGAGGN GGLLFGNGGA GGQGGSGGLG GSGGT GGAGM 240 

20 AAGPAGGTGG IGGIGGIGGA GGVGGHGSAL FGHGGINGDG GTGGMGGQGG AGGNGWAAEG 3 00 

I TVGIGEQGG QGGDGGAGGA GGIGGSAGGI GGS QGAGGHG GDGGQGGAGG SGGVGGGGAG 3 60 

AGGDGGAGGI GGTGGNGSIG GAAGNGGNGG RGGAGGMATA GSDGGNGGGG GNGGVGVGSA 420 

GGAGGTGGDG GAAGAGGAPG HGYFQQPAPQ GLPIGTGGTG GEGGAGGAGG DGGQGDIGFD 480 

GGRGGDGGPG GGGGAGGDGS GTFNAQANNG GDGGAGGVGG AGGTGGTGGV GADGGRGGDS 540 

25 GRGGDGGNAG HGGAAQFSGR GAYGGEGGSG GAGGNAGGAG TGGTAGS GGA GGFGGNGADG 60 0 

GISTGGNGGNGG FGGINGTFGT NGAGGTGGLG TLLGGHNGNI GLNGATGG I G STTLTNATVP 660 

LQLVNTTEPV VFISLNGGQM VP VLLDTGS T GLVMDSQFLT QNFGPVIGTG TAGYAGGLTY 720 

NYNTYSTTVD FGNGLLTLPT SVNWTSSSP GTLGNFLSRS GAVGVLGIGP NNGFPGTSSI 780 

VTAMPGLLNN GVLIDESAGI LQFGPNTLTG GITISGAPIS TVAVQIDNGP LQQAPVMFDS 840 

30 GGINGTIPSA LASLPSGGFV PAGTTISVYT SDGQTLLYSY TTTATNTPFV TSGGVMNTGH 9 00 

V^FAQQPIYV SYSPTAIGTT TFN 923 
<2:X2> Type - PRT 
<211> Length : 923 

SequenceName : SEQ ID 147 

35 SequenceDescription : 



Sequence 

<213> OrganismName r Mycobacterium tuberculosis H37Rv 

40 <4fc00> PreSequenceString : 

MX GNGGAGGS GAP GAI GGAG GPAGLIGVGG AGGAGGDSAV AGVI GGAGGA GGAALLFGAG 60 

GAGGAGGSGG SGAAGGAGGA GGAGGLFASG GSGGFGGFAS TGTGGAGGTG GAGGLFASGG 120 

VGGTGGGAGS GGTGGVGGTG GAGGLFASGG AGGAGGS GGT GGAGGTGGAG GLFGAGGAGG 180 

LGGQGNHTGG HGGAGGSAGL LALGDGGAGG AGGAATTGTG GAGGAGGKAG LLFGS GGAGG 240 

45 SGGAAGTFGD TGNS GGAGGA GGKAGLLFGS GGAGGSGGAG GFANGSTGGA GGAGGGAGL I 30 0 

GNGGNGGSGG TSVATGGAGN GGAGGAGGGA GLIGNGGNGG SGGMGDAPGG TGVGGIGGLL 3 60 

LGLDGANAPA STNPLHTAQQ QALAAVNAPI QAVTGRPLIG NGANGAPGSG APGGHGGWLF 420 

GGGGTGGSGV S GGAGGD GGA GGILFGAGGA GGAGGAVTGT GATGGS GGAG GGALL FGAGG 480 

AGGAGGS S GI GGFAAGGAGG PGGAGGL FNG GGAGGAGGSG VSGGAGGEGG AGGAGGLFAG 540 

50 GGAGGAGGSG NNVGGAGGAG GVGGLFGAGG AGGSGGGGSV AGDSGAGGNA GLLAPGLAGG 600 

AGGGGGQGFD TGGAGGPGGD AGLLVGS GGV GGAGGFGLTT GGPGAAGGDA GLLFGSGGAG 660 

GA.GGSGRTDL GGAGGAGGKA GLIGNGGNGG AGGAGGNGGG DGGPGGAAFG LGNGGNGGNG 720 

GTGTSAGSPG AGGAGGS L I G AEGLPGLLP 749 
<212> Type : PRT 

55 <211> Length : 749 

SequenceName : SEQ ID 148 
SequenceDescription : 



Sequence 
60 

<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<400> PreSequenceString : 

MSFVIAAPEA LVAVASDLAG I GS ALAEANA AAL APT TALL AAGADEVSAA IAALFGAHGQ 60 

AYQTVSAQAS AFHAQFVQAL TGGGGAYAAA EAANVSAAQS TDQRLLDLIN GPTQALLGRP 12 0 

65 LXGDGANGGP GQDGGPGGLL YGNGGNGGTS TTAGVAGGNG GAAGL I GNGG AGGGGGAGAA 18 0 

GGNGGAGGWL YGNGGAGGAG GTSVIPGVAG GNGGAGGSAG LWGTGGAGGD GGNGRSGPVN 240 

VAGSAGGNGG AGGAAGL FGD AGAGGNGGKG GAGGAAFSIN FTAGDGGAGG AGGSGGHALL 3 00 
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WGAGGAGGNG GSGGTGGAGG STAGAGGNGG AGGGGGTGGL LFGNGGAGGH GAAAGNGLAA 3 60 

GNGVSSSGGG GAGGTGGAGG DGGAGGAGGN ARLWGVGGAG GAGGD GGAGG AGGKGGSGIiS 42 0 

GNANGGAGGD SGRGGTGGAG GEGGAAGLLV GTGGHGGDGG AGGAAVKGGD GGAAAGTGIA 48 0 

GAGGRGGAGG SGGSGGDGGG GAAGPAGWLF GDGGAGGNGG AAAAGGAGGQ AGGGGGNGGN 540 

5 GGNGGNGGNG GNGATGGWLY GNGGAGGQGA TAGAGGAGAN GVSSTNGGGT GGNGGIGGTG 60 0 

GSGGAGGNAG LLGVGGAGGH GAS GGAGDRG GAGGTGFISS DGGAGGDGGD GGNGGAGGTG 660 

GLLFGAGGNG GPGGS GGAAD I GGNGGAGNG GGTDGNGGNG GSGGGAGSGG DGGGAGGNGA 720 

WLFGNGGAGG GGGKGGNGAG GGLGGGSFGL PGLNGSGGDG GDGGNGAPGG VLYGNGGAGG 780 

QGSSGGIGGP GATGGAGGKG GDGGDAQL I G DGGNGGNGGA GGTGGTPGPG GPGGS GGLGG 840 

10 LLFGQTGTAG VSP 853 



<212> Type : PRT 

<211> Length : 853 

SequenceName : SEQ ID 149 
SequenceDescription : 

15 

Sequence 



<213> OrganisniNarae : Mycobacterium tuberculosis H3 7Rv 
<400> PreSequenceString : 

20 MS YLVWPEL VAAAATDLAN IGSSISAANA AAAAPTTALV AAGGDEVSAA IAALFGAHAR 60 

AYQALSAQAA MFHEQFVRAL AAGGNS YAVA EAATAQSVQQ DLLNL INAPT QALLGRPLIG 120 

NGANGLPGTG QNGGDGGILY GNGGNGGSGG VNQAGGNGGN AGLWGNGGSG GAGGNATTAG 180 

RNGFNGGAGG SGGLLWGNGG AGGAGGNGGP APLVGGVGTT GGAGGNGGGA GLFYGFGGAG 24 0 

GNGGMGGVAP STGPSMGILP AGGVGGPGGS GGASALAFGS GGVGGAGGLG GPTDGTVQGV 3 00 

25 GGFGGQGGNG GQSGLLFGNA GAGGAGAAGG AGTGDTESFG GHGGAGGDGG AVGLIGNGGA 3 60 

GGTGSPGAW GGNGGVGGLG GAGSPGGLLY GTGGAGGNGG PGGDGGTGAT VGFAGS GGFG 420 

GAGGIAQLFG TGGMGGSGGG I GAGTTTWP PDVAPVGGTG GNGGRAGLLL GV GGMGGNGG 480 

ATSVGGTLYA AGGNGGDGGL VWGNGGTGGS GGAGGAGSVG NGGAGGNAAL LFGNGGAGGA 540 

GGAGGIGAGG AGGFGAVLFG NGGAGGSGAP GGI GAGGNGG NALLVGNGGN GGAGTGGAAG 600 

30 GAGGS GGLLF GQNGMPGP 618 



<212> Type : PRT 

<211> Length. : 618 

SequenceName t SEQ ID 150 
SequenceDescription : 

35 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<400> PreSequenceString : 

4-0 MNFSVLPPEI NSALIFAGAG PEPMAAAATA WDGLAMELAS AAASFGSVTS GLVGGAWQGA 60 

SSSAMAAAAA PYAAWLAAAA VQAEQTAAQA AAMIAEFEAV KTAWQPMLV AANRADLVSL 120 

VMSNLFGQNA PAIAAIEATY E QMWAADVS A M S A YHAGAS A IASALSPFSK PLQNLAGLPA 18 0 

WLAS GAP AAA MTAAAGI PAL AGGPTAINLG IANVGGGNVG NANNGLANIG NANLGNYNFG 240 

SGNFGNSNIG SASLGNNNIG FGNLGSNNVG VGNLGNLNTG FANTGLGNFG FGNTGNNNIG 3 00 

45 IGLTGNNQIG IGGLNSGTGN FGLFNSGSGN VGFFNSGNGN FGIGNSGNFN TGGWNSGHGN 3 60 

TGFFNAGS FN TGMLDVGNAN TGSLNTGSYN MGDFNPGSSN TGTFNTGNAN TGFLNAGNIN 420 

TGVFNI GHMN NGLFNTGDMN NGVFYRGVGQ GSLQFSITTP DLTLPPLQIP GISVPAFSLP 480 

AITLPSLNIP AATT PANT TV GAFSLPGLTL PSLNIPAATT PAN I TVGAF S LPGLTLPSLN 540 

IPAATTPANI TVGAFSLPGL TLPSLNIPAA TTPANITVGA FSLPGLTLPS LNIPAATTPA 600 

50 NITVGAFSLP GLTLPSLNIP AATTPANI TV SGFQLPPLSI PSVAIPPVTV P P I TVGAFNL 660 

PPLQIPEVTI PQLTIPAGIT IGGFSLPAIH TQPITVGQIG VGQFGLPSIG WDVFLSTPRI 720 

TVPAFGI PFT LQFQTNVPAL QPPGGGLSTF TNGALIFGEF DLPQLWHPY TLTGPIVIGS 780 

FFLPAFNIPG IDVPAINVDG FTLPQITTPA ITTPEFAIPP IGVGGFTLPQ ITTQEIITPE 840 

LTINSIGVGG FTLPQITTPP ITTPPLTIDP INLTGFTLPQ ITTPPITTPP LTIDPINLTG 900 

55 FTLPQITTPP ITTPPLTIEP IGVGGFTTPP LTVPGIHLPS TTIGAFAIPG GPGYFNSSTA 960 

PS SGFFNSGA GGNS GFGNNG SGLSGWFNTN PAGLLGGSGY QNFGGLSSGF SNLGSGVSGF 1020 

ANRGILPFSV ASWSGFANI GTNLAGFFQG TTS 1053 
<212> Type : PRT 
<211> Length : 1053 

60 SequenceName : SEQ ID 151 

SequenceDescription : 



Sequence 



65 <213> OrganismName : Mycobacterium tuberculosis H37Rv 
<400> PreSequenceString : 

MLYWASPDL MTAAATNLAE IGSAISTANG AAALPTVEW AAAADEVSTQ IAALFGAHAR 60 
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SYQTLSTQAA AFHSRFVQAL TTAAASYASV 
GADGSTPGQA GGPGGLLYGN GGNGAAGGPN 
GTGGLLFGNG GAGGQGGLGL AGINGGSGGQ 
PTPIGTAAPG SDGVNQIGNG GNTDLTGGAG 
5 SFGGAGGAGG DGANGGD GGA GGEALTEGGA 
ATTSVTGGNG GNGGNGHDSN APGGAGGSGG 
APGGAGGAGG KADIANSLGD NATVTGGNGG 
IGNGGAGGAG GLGGAGGAGG AGGEGGAGGA 
GAGGAPGLGG AGGAGGWLIG QSGSTGGGGA 
10 SSGTAGFDGN PGQPG 
<212> Type : PRT 
<211> Length : 615 

SequenceName : SEQ ID 152 
SequenceDeseription : 

15 

Sequence 



<213> OrganisrnName : Mycobacterium tuberculosis H3 7Rv 

<40 0> PreSequenceString : 
20 MHYSVLPPEI NSALIFAGAG SGPMLAAASA 

SSVAMAAAAA PYAGWLAAAA TQAEQAATQA 

VISNLFGQNA PAIAAAEAAY EEMWALDVSA 

GPAAWTALT TAVGMP T FAG RAIAASLGLA 

NFGSFNIGSA NLGGNNIGIG NAGANNFGLA 
25 LTGNNQIGIG GLNSGNGNVG LFNAGSANIG 

FLNAGSFNTG MFDVGNANTG SFNVGHYNFG 

AFNIGDMNNG LFNTGDMNNG VFYRGVGQGS 

TLPSLTIPAV TTPANVTVGA FDLPGIiTVPS 

ATTTPANITV GAFNLPQLSI PSVTVPPITI 
30 TVGGFTLPTI HTPIiISTPQI SIGGFSTPGI 

NITALQTNMP GVFPQIGGFA NTPPAFINTG 

IQPWSLGGIS VDGFTLPEIS TQEFTTPALT 

LGGFTLPQLS IPAITTPAFT IDPIALGGFT 

IPEITTPEFT IQPVGLAAFT TPALTIASIH 
35 AGIGGNSGFG NSGSGLSGWF NTSPVGLLAG 

FAVTSLVSGL ANIGNNLSGL FFQSTTP 

<212> Type r PRT 

<211> Length : 987 

SequenceName : SEQ ID 153 
40 SequenceDe script ion : 

Sequence 



<213> OrganisrnName : Mycobacterium tuberculosis H3 7Rv 

45 <400> PreSequenceString : 

MSFVWAPEV LAAAASDLAG I GSTLAQANA AAL APT TAVL AAGADEVSAA IASLFGAHGQ 60 

AYQAVSAQMS AFHAQFMQAL TGAGGAYAAA EAVNVSAAQS VEQDLLAAIN ARFERI FGRP 12 0 

L I GDGANGGP GQDGGPGGLL YGNGGNGGTS TTVGMAGGNG GAAGL I GNGG FGGGGGPGAA 180 

GGNGGAGGWL FGNGGAGGAG GLGVAPGVPG GAGGAGGAGG VGGPAGLWGH GGAGGAGGAG 240 

50 VAGAGGFEGT I GAGGAGGVG GAGGVGGAGG AGGWLYGDAG AGGDGGVGGA GGTGGLGNRG 30 0 

GAGGAGGAGG VGGAGGAAGL WGGGGAGGVG GTGGGAGLGA QSVTFSSSLS GLS GGDGGAG 3 60 

GAGGAGGAGG TGGWLYGGGG AAGSGGDGGT GGQGGAGGAG VFSLFGSGGG PGGNGGVGGV 420 

GGVGGAGGRA GLFGVGGLGG AGGDAGDSGE GGFGGP GLAG GLFGNPGNGG VGGIGGDAAA 480 

GGAGGAGGNG GAGGNGGWLF GNGGAGGSGG D GGAAGRGGA GNLGSAGGIN APAGNPGSGS 540 

55 VG I GGAGGAG GTAGLFGDGG AGGAGGAGAA GGFGGISAAT PSAGSEGAMG GAGGVGGNAR 600 

LLGTGGAGGV GGGGGAGGDG GRGGVATPGG QGGDAGDGGA GGAGGNGGGA SGAGGWLLGT 660 

GGAGGAGGNG GNGGKAGFSP GPTNFGLNGA GGGGGVGGNG ATGPWLFGDG GPTPGSTGAG 72 0 

AAGGHGGDAQ L I GNGGHGGA GGTGVPNGSG GAGGLSGLLF GEPGANG 767 
<212> Type : PRT 

60 <211> Length : 767 

SequenceName : SEQ ID 154 
SequenceDe script ion : 

Sequence 

65 

<213> OrganisrnName : Mycobacterium tuberculosis H3 7Rv 
<4 00> PreSequenceString : 



50/341 



EAANAS PLQV ALDVINAPAQ TLLGRPL I GN 120 

QAGGAGGNAG LIGNGGAGGA GGVGAVGGKR 18 0 

GGHGGNAILF GQGGAGGPGG TGAMGVAGTN 24 0 

GDGNAGS TTV NGGNGGTGGA ARNSSGGTGN 3 00 

TAVSGAGGKG GNAEASGGAG GNGGKGGFAQ 3 60 

VGGD GGRGGL LAGNGGTGGA GGNGGTGGAG 42 0 

TGGDGGSALG TGGAGGAGGL GGHGGAGGLL 48 0 

GGEAIPGGAS TNSAGGDGGA GGTGGNGGDG 54 0 

GGAGGAGGAG GAGGSGGAGG HGDTTSGKNG 600 

615 



WDGLATELAS 


AAVSFGSVTA 


GLVGGSWQGR 


60 


QVMVAEFEAV 


RLAMVQPALV 


AANRSGLISL 


120 


MAAYHSGASA 


VAVALPAFAL 


PLRLPAGLAA 


180 


NVGGGNL GNA 


NNGLGN I GNA 


NLGNNNLGSG 


240 


NLGNLNTGFA 


NAG I GNF G I A 


NTGNNNIGNG 


300 


FFNSGNGNFG 


IGNSGNFSTG 


LFNPGHGNTG 


360 


AFNPGP SNTG 


TFNTGGANTG 


WFNTGSINTG 


420 


LQFAITSPDL 


TLPSLEIPGI 


SVPAFSLPAI 


480 


LT I PAAMTPA 


NITVGAFDLP 


GLTVPSLTIP 


540 


PAGTALGAFN 


LPTLSIPSVT 


VPPITIPAGT 


600 


ATQANSGVIN 


LPTFSLNGIT 


ITNLWFIPN 


660 


TITVGGGQIN 


GVGFSIGAIN 


VTPFTLPNW 


720 


ISPIGVGALS 


LPDITTQQFT 


TPELTIDPIT 


780 


LPQIMTPEIT 


TPPFAIDPIG 


LSGFTLPQVN 


840 


LPSTTMGGFA 


IPAGPGYFNS 


SATPSLGFFN 


900 


S GYQNYGGL I 


SGFSNLGSGI 


SGFANTGTLP 


960 
987 
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MSFVIANPEM LAAAATDLAG IRSAISAATA AAAAPTIQVA AAGADEVSLA ISALFGQHAQ 60 

AYQALSAQAT TFHDQFVQAL TSGGNLYAAA ESHTVEQMVL WAINAPTQTL FGRPLIGDGA 12 0 

NGTAENPDGQ ISTGGLLFGNGG NGFTQTTAGV AGGNGGSAGL IGNGGAGGGG GAGAAGGLGG 18 0 

NGGWLYGNGG AGGIGGAGTG TGGHGGAGGA GGRAWLWGTG GAGGAGGDGG WLFGDGGAGG 240 

5 TGGNGGSGFN SLTSSVGGAG GAGGHAGLFG AGGTGGTGG I GGQNTETGPA ASNGGAGGAG 3 00 

GGGGYLVGDG GAGGTGGAGG KNSSGGATLT GGTGGTGGAG GAAGWLYGSG GAGGAGGAGG 360 

LNNAGGATGG TGGTGGAGGS GAWLYGNGGA AGAGGNGGNN TSAGTGGVGA S GGTGGNAGL 42 0 

IGAGGHGGAG GAGGNQTGGV GNGGAGGNGG AGGAGGQLYG NGGDGGNGGA GGANIAGGNG 48 0 

SDGGAAGHGG 2\GGSARLIGA GGHGGDGGAG GNTAGRRADA IAGTGGDGGN GGNGGLLSGN 540 

10 AGAGGHGGAG GSSTATTTTG TPPTGATGGN GGNGGAGGTA GFTGSGGIGG NGGAGGTGGN 60 0 

AGVALSVGST GGLGGNGGSG GLGGGGGSLF GNGGAGGVGA TGGNGGSGIG PASVGGNGGK 660 

GGVGAAGGLA GQIGNGGSGG SGGAGGNGGT GDTAGNGGNG GAGAVGGNAQ LIGNGGNGGG 72 0 

GGNGGTGADG T 731 
<212> Type : PRT 

15 <211> Length : 731 

SequericeName : SEQ ID 155 
SequericeDescription : 



Sequence 
20 

<213> Organ i smName : Mycobacterium tuberculosis H37Rv 
<400> PreSequenceString : 

MPGRFRNFGS QNLGSGNIGS TNVGSGNIGS TNVGSGNIGD TNFGNGNNGN FNFGSGNTGS 60 
NNIGFGNTGS GNFGFGNTGN NNIGIGLTGD GQIGIGGLNS GSGNIGFGNS GTGNVGLFNS 12 0 

25 GTGNVGFGNS GTANTGFGNA GNVUTGFWNG GSTNTGLANA GAGNTGFFDA GNYNF GSLNA 18 0 

GMINSSFGNS GDGNSGFLNA GDVNSGVGNA GDVNTGLGNS GNINTGGFNP GTLNTGFFSA 240 
MTQAGPNSGF FNAGTGNSGF GHNDPAGSGN SGIQNSGFGN SGYVNTSTTS MFGGNSGVLN 3 00 

TGYGNSGFYN" AAWNTGIFV TGVMSSGFFN FGTGNSGLLV SGNGLSGFFK NLFG 3 54 

30 <212> Type = PRT 

<211> Lengtri : 354 

SequenceName : SEQ ID 156 
SequericeDescription : 

35 Sequence 



<213> Organ i smName : Mycobacterium tuberculosis H37Rv 
<400> PreSecxuenceString : 

MSFVIiAMPEV XiGSAATDLAA LGSVLGAADA AAAATTTGIV AAAQDEVSAA IAALFSAHGR 60 

40 AYQVASAQAA AVHAQFVEAL SAGAGAYASA EAAGAAVLAN PAQSVQQDLL AAVNAQSVAL 120 

TGRPLIGNGA isTGAPGTGANG APGGWLLGNG GAGGSAAAGS GLPGGAGGAA GLFGTGGAGG 180 

AGGSSTVGDG ELAGGAGGSGG WLLGTGGVGG VGGLGAGAGG AGGVGGAGGL LGAGGHGGAG 240 

GLGAVTGGVG GTGGAGGLLA GLLAGPGGAG GTGGRGFLNN GGVGGAGGNA GLLFGAGGTG 3 00 

GSGGAGLGGD GGAGGAGGNT GVL FGNAGS G GTGGFGDTDG GAGGAGGDAG WIiGSGGVGGA 3 60 

45 GGFGETGDGG VGGAGGKAGL LIGNGGAGGA GGQGAVTGGT GGAGGD GVL I GNGGNAGIGG 420 

TGP TAGDTGA GGISGLLLGA DGFNTPASAS PLHTLKQQAL AAINAPTQTL TGRPIi I GNGT 480 

PGAVGS GATG APGGWLLGDG GAGGS GAAGS GAP GGAGGAA GLWGTGGAGG AGGSSAGGGG 540 

AGGAGGAGGW LLGDGGAGGI GGASTVLGGT GGGGGVGGLW GAGGAGGAGG TGLVGGDGGA 600 

GGAGGTGGLL AGLIGAGGGH GGTGGLSTNG DGGVGGAGGN AGMLAGP GGA GGAGGDGENL 660 

50 DTGGDGGAGG SAGLLFGSGG AGGAGGFGFL GGDGGAGGNA GLLLS SGGAG GFGGF GTAGG 720 

VGGAGGNAGW LGFGGAGGVG GSAGLIGTGG NGGNGGTGAN AGS PGTGGAG GLLLGQNGLN 78 0 

GLP 783 
<212> Type = PRT 
<211> Lengtti : 783 

55 SequenceName : SEQ ID 157 

SequenceDescription : 



Sequence 



60 <213> Organ ± smName : Mycobacterium tuberculosis H37Rv 
<400> PreSecxuenceString : 

MSLVIATPQL LATAALDLAS I GSQVS AANA AAAMPTTEW AAAADEVSAA IAGLFGAHAR 60 

QYQALSVQVA AFHEQFVQAL TAAAGRYAST EAAVERSLLG AVNAPTEALL GRPL I GNGAD 120 

GTAPGQPGAA GGLLFGNGGN GAAGGFGQTG GSGGAAGLIG NGGNGGAGGT GAAGGAGGNG 180 

65 GWLWGNGGNG GVGGTSVAAG IGGAGGNGGN AGL FGHGGAG GTGGAGLAGA NGVNPTPGPA 24 0 

ASTGDSPADV SGIGDQTGGD GGTGGHGTAG TPTGGTGGDG ATATAGSGKA TGGAGGDGGT 30 0 

AAAGGGGGNG GDGGVAQGD I AS AF GGDGGN GSDGVAAGSG GGS GGAGGGA FVHIATATST 360 
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GGSGGFGGNG AASAASGADG GAGGAGGNGG AGGLLFGDGG NGGAGGAGGI GGDGATGGPG 42 0 

GSGGNAGIAR FDSPDPEAEP DWGGKGGDG GKGGSGLGVG GAGGTGGAGG NGGAGGLLFG 480 
MGGNGGNAGA GGD GGAGVAG GVGGNGGGGG TAT FHEDPVA GWAVGGVGG DGGSGGSSLG 540 
VGGVGGAGGV GGKGGAS GML IGNGGNGGSG GVGGAGGVGG AGGDGGNGGS GGNAS TFGDE 600 
5 NSIGGAGGTG GNGGNGANGG NGGAGG I AGG AGGSGGFLSG AAGVS GADGI GGAGGAGGAG 660 
GAGGS GGEAG AGGLTNGPGS PGVSGTEGMA GAPG 694 
<212> Type : PRT 
<211> Length : 694 

SequenceName : SEQ ID 158 
10 SequenceDe script ion : 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H37Rv 

15 <40 0> PreSequenceString : 

MSFVIAAPEV IAAAATDLAS LESSIAAANA AAAANTTALL AAGADEVSTA VAALFGAHGQ 60 
AYQALSAQAQ AFHAQFVQAL TSGGGAYAAA EAAATSPLLA PINEFFLANT GRPLIGNGTN 120 
GAPGTGANGG DGGWLIGNGG AGGS GAAGVN GGAGGNGGAG GL I GNGGAGG AGGRAS TGTG 180 
GAGGAGGAAG MLFGAAGVGG PGGFAAAFGA TGGAGGAGGN GGLFADGGVG GAGGATDAGT 240 

20 GGAGGSGGNG GLF GAGGTGG PGGFGI FGGG AGGDGGSGGL FGAGGTGGSG GTSIINVGGN 3 00 

GGAGGDAGML SLGAAGGAGG SGGSNPDGGG GAGGI GGDGG TLFGSGGAGG VCGL GFDAGG 3 60 

AGGAGGKAGL LIGAGGAGGA GGGS FAGAGG TGGAGGAPGL VGNAGNGGNG GASANGAGAA 420 
GGAGGSGVLI GNGGNGGSGG TGAPAGTAGA GGLGGQLLGR DGFNAPAS TP LHTLQQQILN 480 
AINEPTQALT GRPLIGNGAN GTPGTGADGG AGGWLFGNGG NGGHGATGAD GGDGGSGGAG 540 

25 GIIiSGIGGTG GSGGIGTTGQ GGTGGTGGAA LLIGSGGTGG SGGFGLDTGG AGGRGGDAGL 600 
FLGAAGTGGQ AALSQNFIGA GGTAGAGGTG GLFANGGAGG AGGFGANGGT GGNGLLFGAG 660 
GTGGAGTLGA DGGAGGHGGL FGAGGTGGAG GSSGGTFGGN GGSGGNAGLL ALG AS GGAGG 720 
SGGSALNVGG TGGVGGNGGS GGSLFGFGGA GGTGGSSGIG SSGGTGGDGG TAGVFGNGGD 780 
GGAGGFGADT GGNS S S VP3STA VLIGNGGNGG NGGKAGGTPG AGGTSGLIIG ENGLNGL 837 

30 

<212> Type : PRT 
<211> Length. : 83 7 

SequenceName t SEQ ID 159 

SequenceDescription : 

35 

Seqtxence 



<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<40O> PreSequenceString : 

40 MSFVIAVPET IAAAATDLAD LGSTIAGANA AAAANTTSLL AAGADEISAA I AAL FGAHGR 60 

AYQAASAEAA AFHGRFVQAL TTGGGAYAAA EAAAVTPLLN SINAPVLAAT GRPLIGNGAN 120 

GAPGTGANGG DAGWLIGNGG AGGS GAKGAN GGAGGPGGAA GLFGNGGAGG AGGTATANNG 180 

I GGAGGAGGS AMLFGAGGAG GAGGAATSLV GGIGGTGGTG GNAGMLAGAA GAGGAGGFSF 240 

STAGGAGGAG GAGGLFTTGG VGGAGGQGHT GGAGGAGGAG GL F GAGGMGG AGGFGDHGTL 3 00 

45 GTGGAGGDGG GGGLFGAGGD GGAGGSGLTT GGAAGNGGNA GTLSLGAAGG AGGTGGAGGT 3 60 

VFGGGKGGAG GAGGNAGMLF GSGGGGGTGG FGFAAGGQGG VGGSAGMLSG SGGSGGAGGS 420 

GGPAGTAAGG AGGAGGAPGL I GNGGNGGNG GESGGTGGVG GAGGNAVL I G NGGEGGI GAL 480 

AGKS GFGGFG GLLLGADGYN APESTSPWHN LQQDILSFIN EPTEALTGRP LIGNGDSGTP 540 

GTGDD GGAGG WLFGNGGNGG AGAAGTNGSA GGAGGAGGIL FGTGGAGGAG GVGTAGAGGA 600 

50 GGAGGS AFL I GSGGTGGVGG AATTTGGVGG AGGNAGLL I G AAGLGGCGGG AFTAGVTTGG 660 

AGGTGGAAGL FANGGAGGAG GTGSTAGGAG GAGGAGGLYA HGGTGGPGGN GGS TGAGGTG 72 0 

GAGGPGGLYG AGGS GGAGGH GGMAGGGGGV GGNAGSLTLN AS GGAGGS GG SSLSGKAGAG 780 

GAGGSAGLFY GSGGAGGNGG YSLNGTGGDG GTGGAGQITG LRSGFGGAGG AGGAS DTGAG 840 

GNGGAGGKAG LYGNGGDGGA GGDGATSGKG GAGGNAWIG NGGNGGNAGK AGGTAGAGGA 90 0 

55 GGLVLGRDGQ HGLT 914 
<212> Type : PRT 
<211> Length : 914 

SequenceName : SEQ ID 160 
SequenceDescription : 

60 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<40 0> PreSequenceString : 
65 MSLVIVTPET VAAAASDVAR IGSSIGVANS AAAGSTTSVL AAGADEVSAA IATLFGSHAR 60 
EYQAISTQVA AFHDRFAQTL SAAVGSYVSA EATNAAPLAT LEHNVLNALN APTQALLGRP 120 
L I GDGAAGAP GTGQAGGAGG ILWGNGGAGG SGAPGQVGGA GGAAGL FGTG GAGGAGGAGA 180 
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AGGAGGSGGW LLGNGGVGGA GGQSLLGGAT GGAGGNAGLF GVGGTGGPGG PGGPGGVGGT 240 

GGAGGLGGTL YGAGGHGGAG GPGPIGGVGG HGGVGGAAGL LGVGGHGGAG GHGAEGVAGA 3 00 

AGEDLSPHGT SGGVGGDAGD GGTGGRGGWL AGAGGAGGAG GVGGTGGAGG AGFSRALIVA 3 60 

GDNGGDGGNG GMGGAGGAGG PGGAGGLISL LGGQGAGGAG GTGGAGGVGG DRGAGGPGNQ 420 

5 AFNAGAGGAG GHGGDPGAGG AGGTGGAGS I TGAQGAIGAT PTSGGNGGAG GNGANATTAG 480 

TNGANGGPGG HGGLVGNGGA GGNGANGAAG TNASDSGAVG GKGNS GGNGG QGGAGGDGGT 540 

LAGNGGAGGT GGRGADGGLG GSGAEGANAT TAGERGQDGG KGGNGGVGGT GGNAVAPGAN 60 0 

GGHGGNGGNP GFSGAGGLGG LSGDGVTRAA QGATPDFADT GGKGGNGGNG ANAVAPGGTG 660 

AS GGAGGNAG AGGKGGEKTII GDGGGGNGGA GGKGGAGTLL GLTVFGDNGG AGVLGDSTDP 720 

10 DGSGGAGGAG GAGGAGGD 3? T I 741 



<212> Type : PRT 

<211> Length : 741 

SequenceName z SEQ ID 161 
SequenceDesciript ion : 

15 

Sequence 



<213> OrganismName r Mycobacterium tuberculosis H3 7Rv 
<400> PreSequenceS taring : 

20 MS FVTAAPEM LATAAQNV1AN IGTSLSAANA TAAASTTSVL AAGADEVSQA IARLFSDYAT 60 

HYQSLNAQAA AFHHSFVQTL NAAGGAYSSA EAANASAQAL EQNLLAVINA PAQALFGRPL 120 

IGNGANGTAA SPNGGDGGIL YGNGGNGFSQ TTAGVAGGAG GSAGLIGNGG NGGAGGAGAA 180 

GGAGGAGGWL LGNGGAGGPG GPTDVPAGTG GAGGAGGDAP LIGWGGNGGP GGFAAFGNGG 240 

AGGNGGASGS Ii FGVGGAGGV GGSSEDVGGT GGAGGAGRGL FLGLGGDGGA GGT SNNNGGD 3 00 

25 GGAGGTAGGR LFSLGGDGGN GGAGTA I GSN AGDGGAGGDS SALIGYAQGG SGGLGGFGES 3 60 

TGGDGGLGGA GAVL I GTGVG GFGGLGGGSN GTGGAGGAGG TGATLIGLGA GGGGGI GGFA 42 0 

VNVGNGVGGL GGQGGQGAAL I GLGAGGAGG AGGATWGLG GNGGDGGDGG GLFSIGVGGD 480 

GGNAGNGAMP ANGGNGGKTAG VIANGSFAPS FVGFGGNGGN GVNGGTGGSG GILFGANGAN 540 

GPS 543 

30 <212> Type r PRT 

<211> Length. : 543 

SequenceUame r SEQ ID 162 
SequenceDescrription : 

35 Sequence 



<213> OrganismName r Mycobacterium tuberculosis H37Rv 
<400> PreSequenceString : 

MSYVIiATPEM VAAAANNXiAQ IGSTLSAANA AALAPTTGVL AAGADEVSAA VASLFSGHAQ 60 

40 AYQTLGTQAA AFHERFIQAIi STAAGAYGSA EAANASPLQQ ALNVINAPTQ TLLGRPLIGN 12 0 

GTNGAPGTGQ AGGPGGLL»YG NGGNGGSGGV GQAGGAGGSA GIiIGI GGTGG AGGAGAVGGV 180 

GGNGGWLYGN GGAGGLGGTG VAGVNGGMGA AGGAGGNAYL FGSGGAGGQG GMGAAGADGV 240 

NPTPTGTADA GS TGTDQTXjG GNAIGGNGGP GDAGDAMTSG GAGGSGGNAV STVNGDAVGG 3 00 

EGGKGGEGAY GGAGGAGGSA AS IGNAAIGG NGGAGGNAQA PGGVGGAGGE GGDAQVGTNS 3 60 

45 PSNAEAGNGG SGGNGFDSFA SGGTGGAGGT GGAGGRGGLL I GDGGAGGAG GVGGTGGSGA 42 0 

PGGGGGAGGD GGAANTDSAG SSRKAFGGDG GVGGDGASAL GTGGEGGIGG QGGNGGAGGL 480 

L I GNGGAGGV GGTAGAGGTG GSGGAGGAGG AGGGGTNSGP GAAFGGNGNT GGNGGNGGAP 540 

GALGGKGGSG GL I GRAGS X) G GVGAGGAGGA GGAGGTGGEG GTGGDGKTTD GNPGMGGSPG 600 

SAGQPG 606 
50 <212> Type : PRT 

<211> Length : 606 

SequenceName : SEQ ID 163 
SequenceDescription : 

55 Sequence 



<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<4 00> PreSequenceS taring : 

MSFLFAQPEM LGAAATDIxAS I GS AI S TANA AAAAATTRVL, AAGADEVSAA VAALFSGHAQ 60 

60 TYQALRTQAA AFHQQIVQTL TSTAGAYASA EAANVEQQLL GAINAPTMAL LGRPL IGHGA 12 0 

DGAPGTGQAG GAGGI L YG3SFG GNGGS GATGQ AGGAGGAAGL IGHGGAGGLG GTGAS GGAGG 18 0 

AGGWLWGNGG AGGNGGVGVA GDPGGVGGAG GAGGAAGLWG SGGSGGTGGQ GGVGGGKSGD 240 

GGTGGIGGAG GGGGWLHGDG GAGGHGGQGG TGVSS GGNGG AGGTGGD GRG LSGSGGAGGR 3 00 

GGQTGVGGKV GENNFGGAGG AGGTGGL I GN GGAGGNGGQG AISGAGGAGG NAWLIGDGGA 3 60 

65 GGNGGDIRGQ GGGAGGAGGA GGQLIGNGGT GGAGGTVTSP NGL GGAGGAG GSAGL IGHGG 42 0 

TGGAGGHSAQ GPDGNGGX GG AGGAGGNGGQ LYGTGGTGGT GGKGGDGFGV FGKGGAGGTG 480 

GRGGAAGLIG DAGTGGTGGK GGTAGED GTG GNGGTGGNGG AAVLIGNGGG GGAGGNGGAG 540 
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NDGTPGNGGG GGVGGTGGTL FGQPGQPGPP GQPGPA 576 
<212> Type : PRT 
<211> Length = 576 

SequenceName : SEQ ID 164 

SequenceDescription : 

Sequence 



<213> OrganisrnName : Mycobacterium tuberculosis H37Rv 

10 <400> P resequences t ring : 

MWTSQMIVAP AFVDAAAKDL ATIGSAISRA NAEALVPITA LLPAGADDVS AAIAALFATH 60 
GQAYQELSAH AVAFHEQ FVQ LMSAGAAQYA SAEAANSSPL Q I VGQTALDA INSPVQTLTG 120 
RPLIGNGANG VAGTGQNGGD GGWLYGNGGN GGS GGTGQNG GNGGSAGLWG SGGNGGQGGA 180 
GANGAAGQPG KAGGSGGNGG AGGW I YGHGG HGGAGGNGGN ATAPGGASAG FDGGAGGNGG 240 

15 SGGRGGLLFG NGGNGSVGGM GGQGTNDTAG DSAGSGGLGG NGGNGAQGGW LIGNGGQGGD 3 00 

SGAGGGTDST QTGVMNGASG GS AG I AGNGG DAGLVGNGGA GGNGGNGAAG SALGTTIFGG 3 60 

SGGVGGSGGD GGISTGGWLFGS GAS GGNGGQG GDAGTNGFAG FGGSAGGGGW VGAVNFGPIS 420 
VQGFGLFGHG GDGGNGGDVG AGSLSIQFGA SGGDGGQGGV LYGNGGNGGN AGS GGGTGF E 480 
GSAGQGGAAI LIGNGGAGGN GATGGTGVGN IIQEAGGDGS DGGAGGS GGL LFGSGGAGGI 540 

20 GGAGGVGGSG NDGGNGGDGG QGGASGLGIG NGGPGGS GGT GGAGGTGGSA GTGGAGGDGG 6 00 

NAALLIGTGG DGGDGVPPAP GGQGGKGGL I GLPGQNGQP 63 9 

<212> Type : PRT 
<211> Length : 639 

SequenceName : SEQ ID 165 

25 SequenceDescription : 

Sequence 



<213> OrganisrnName : Mycobacterium tuberculosis H37Rv 

30 <400> Pre Sequences t ring : 

MSWVMVSPEL WAAAADLAG IGSAISSANA AAAVNTTGLL TAGADEVSTA IAALFGAQGQ 60 

AYQAASAQAA AFYAQFVQAL SAGGGAYAAA EAAAVS PLL A PINAQFVAAT GRPLIGNGAN 120 

GAP GTGANGG PGGWLIGNGG AGGS GAPGAG AGGNGGAGGL FGSGGAGGAS TDVAGGAGGA 180 

GGAGGNAGML FGAAGVGGVG GFSNGGATGG AGGAGGAGGL FGAGRERGSG GSGNLTGGAG 240 

35 GAGGNAGTLA TGDGGAGGTG GASRSGGFGG AGGAGGDAGM FFGSGGSGGA GGISKSVGDS 3 00 

AAGGAGGAPG LIGNGGNGGN GGAS TGGGDG GPGGAGGTGV LIGNGGNGGS GGTGATLGKA 360 

GIGGTGGVLL GLDGFTAPAS TSPLHTLQQD VINMVNDPFQ TLTGRPL I GN GANGTPGTGA 420 

DGGAGGWLFG NGGNGGQGTI GGVNGGAGGA GGAGGI LFGT GGTGGSGGPG ATGLGGIGGA 480 

GGAALLFGSG GAGGSGGAGA VGGNGGAGGN AGALLGAAGA GGAGGAGAVG GNGGAGGNGG 540 

40 LFANGGAGGP GGFGSPAGAG GIGGAGGNGG LFGAGGTGGA GGGSTLAGGA GGAGGNGGLF 600 

GAGGTGGAGS HSTAAGVSGG AGGAGGDAGL LSLGASGGAG GSGGSSLTAA GWGGIGGAG 660 

GLLFGSGGAG GSGGFSNSGN GGAGGAGGDA GLLVGSGGAG GAGASATGAA TGGDGGAGGK 720 

SGAFGLGGDG GAGGATGLSG AFHI GGKGGV GGSAVLIGNG GNGGNGGNSG NAGKSGGAPG 780 

PS GAGGAGGL LLGENGLNGL M 8 01 

45 <212> Type : PRT 

<211> Length z 801 

SequenceName : SEQ ID 166 
SequenceDescription : 

50 Sequence 



<213> OrganisrnName : Mycobacterium tuberculosis H37Rv 
<400> PreSequenceString : 

GQSYQAVSAQ AAAFHDRFVQ LLNAGGGSYA SAEIANAQQN LLNAVNAPTQ TLLGRPLVGD 60 
55 GADGASGPVG QPGGDGGILW GNGGNGGDST SPGVAGGAGG SAGLIGNGGR GGNGAPGGAG 120 
GNGGLGGLLL GNGGAGGVGG TGDNGVGDLG AGGGGGD GGL GGRAGL I GHG GAGGNGGDGG 180 
HGGSGKAGGS GGSGGFGQFG GAGGLLYGNG GAAGS GGNGG DAGTGVS SDG FAGLGGSGGR 240 
GGDAGLIGVG GGGGGNGGDP GLGARL FQVG SRGGDGGVGG WLYGDGGGGG DGGNGGLPFI 3 00 

GSTNAGNGGS ARLIGNGGAG GSGGSGAPGS VSSGGVGGAG NPGGS GGNGG VWYGNGGAGG 3 60 

60 AAGQGGPGMN TTSPGGPGGV GGHGGTAILF GDGGAGGAGA AGGP GTPDGA AGPGGS GGTG 420 
GLLFGVPGPS GP3DG 434 
<212> Type : PRT 
<211> Length = 434 

SequenceName : SEQ ID 167 
65 SequenceDescription : 

Sequence 
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<213> OrganismName :* Mycobacterium tuberculosis H3 7Rv 
<400> Pre Sequences tring : 

MAHFSVLPPE INSLRMYLGA GSAPMLQAAA AWDGLAAELG TAASSFSSVT TGLTGQAWQG 60 
5 PASAAMAAAA APYAGFLTTA SAQAQLAAGQ AKAVASVFEA AKAAIVPPAA VAANREAFLA . 12 0 

LIRSNWLGLN APWIAAVESL YEEYWAADVA AMTGYHAGAS QAAAQLPLPA GLQQFLNTLP 18 0 

NLGIGNQGNA NLGGGNTGSG NIGJTGNKGSS NLGGGNIGNN NIGSGNRGSD NFGAGNVGTG 240 

NIGFGNQGPI DVNLLATPGQ NNVGLGNIGN NNMGFGNTGD ANTGGGNTGN GNI GGGNTGN 30 0 

NNFGFGNTGN NNIGIGLTGN NQMG X NLAGL LNSGSGNIGI GNSGTNNIGL FNSGSGNIGV 360 

10 FNTGANTLVP GDLNNLGVGN SGNAETIGFGN AGVLNTGFGN ASILNTGLGN AGELNTGFGN 420 

AGFVNTGFDN SGNVNTGNGN SGNIltfTGSWN AGNVNTGFGI ITDSGLTNSG FGNTGTDVSG 48 0 

FFNTPTGPLA VDVSGFFNTA SGGTVINGQT SGIGNIGVPG TLFGSVRSGL NTGLFNMGTA 540 

ISGLFNLRQL LG 552 
<212>, Type ; PRT 
15 . <211> Length : 552 

SequenceName : SEQ ID 168 
SequenceDescription r 



Sequence 
20 

<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<4 00> PreSequenceString : 

MS FL IAS PEA LAATATYLTG IGSAISAANA VAAAPTTEIL AAGTDEVSTA ISALFGAHAQ 60 
A YQAL S AHVA AFHDQFVHTL TAGAGSYMAA EAAAASPLQA LQLELLNAIN APTLALLGRP 120 

25 LIGDGTDAAP GSGGAGGAGG ILIGNGGTGG ASDLAGTGRG GVGGAGGAGG LFGIGGAGGG 18 0 

CGSAVAIGGD GGAGGAGGVF SGGGAGGAGD AIGGSGGAGG TGGLLGGGGG AGGAGGAGGN 240 
GGGASNSASI GGDGGS GGAG GMLYGAGGVG GNGGAAVAIG GDGGAGGRAG AIGNGGDGGN 3 00 

GGTSNTPGGS GGDGGNGGNA GLIGNGGNGG NAEIVISGGS VAGTGGNGGL LLGFNGTNGL 3 60 

p 361 

30 <212> Type : PRT 

<211> Length : 361 

SequenceName : SEQ ID 169 
SequenceDescription r 



35 Sequence 



<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<400> PreSequenceString : 

AQAS PAAHGG SGGAGGNGGA GSAGNGGAGG AGGNGGAGGN GGGGDAGNAG SGGNGGKGGD 60 
40 GVGPGSTGGA GGKGGAGANG GSSKTGNARGG NAGNGGHGGA GGS GDTGGAG GAGGQ GGFGG 120 
TGGSGSGIGG GAGGNGGNGG AGGTGWLGG KGGDGGNGDH GGPATNPGSG SRGGAGGSGG 180 
NGGAGGNATG SGGKGGAGGN GGDGSFGATS GPAS I GVTGA PGGNGGKGGA GGSNPNGSGG 24 0 

DGGKGGNGGA GGNGGSIGAN SGIVGGSGGA GGAGGAGGNG SLSSGEGGKG GDGGHGGDGV 300 
GGNSSVTQGG SGGGGGAGGA GGSGFFGGKG GFGGDGGQGG PNGGGTVGTV AGGGGNGGVG 3 60 

45 GRGGDGVFAG AGGQGGLGGQ GGNGGGSTGG NGGLGGAGGG GGNAPDGGFG GNGGKGGQGG 420 
IGGGTQSATG LGGDGGDGGD GGNGGNSGAK AGGAGGKGQA GQPNSGTEPG FGGDGGLGGA 48 0 

GATP 484 
<212> Type : PRT 
<211> Length : 484 
50 SequenceName : SEQ XD 170 

SequenceDescription : 



Sequence 



55 <213> OrganismName : Rickettsia prowazekii 
<4 00 > PreSequenceString : 

MKKSKILRKF LATASLCGTL FTNSNATGTI IPNNGSVSLN TDAGLVGGVF NNGDIIQIW 60 

GGREIKISAD KANAIIGGIN TLKELPDFGG VEVSQNVSIG PLNAGEDLNT NFGPLKFISN 120 

NVTSIITGVG TKTFSNIDFA GKNATLQINK DLNITTKIDN TVAGNNGSIT FEGSGIISNH 18 0 

60 IGYTNSLLGI NVGNGEAKIY APEA2SJNITIN AKNINLTHNN SILTLCDGNI TTLKGNINNT 240 

TEIDGQGILN LAYDLGSSSI ITGDIGNIGS LDTINVLLGS ATFNSTILKA TNINLKHNTS 30 0 

TLNLDDNIIV I GNI KGNNNK DILNTFKVHGT NLDNEMIIPA PQKTHGTLNF KGNATLNGNI 360 

NNLNILKFSG GHGKTLNLQG NTKVDNLVFA DSVLDSGTIS VNGLLDTDCV TFNNS NVNGG 42 0 

TLIINAKNTI SAKLLNATKA KIQIiNANLTM NHPSAGDISD IRIADNTIYT IDAKNGNVNL 480 

65 LNNNAKIIFE GADSMLALIN TGVTADRTFT IYNNLNQSGN DEYGIVKIEA IKKVTTIANQ 540 

SGPYTIGQDN THRLKEL IVE GAGDIIIDDT IFTKLLSINS TGQITFNRTL DLGAGGNIAF 600 

GKHGTLWNG VTGSITTSEN NQGILTINSG NITGVIGTNE LGLKLVNIGA DPVTCSANVF 660 
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ASVALTNPSS VLILADGVTL TGEVTTKNNT 
GASNIDSNIY AGSTVLTDQT SELTLNKTDW 
NGAALQEWF NGTTNIGGTA NSQNFTVAHS 
DIDFNNKAGK FILGDGAMID GSVLCNGGVA 
5 DNTKNVTIAN DIFVDNIHFT NGGILQLtGGN 
NGQNGILNAF TNLKASDDTI GTVKIIXTIGQ 
ANSQLILSAP VDQTIKFINN LNETGGGIIT 
KGKVTVTNDL DIQNIHQLNI NNGALFDDQS 
LNTSGMVFKH QDSILELKNS SNTNDHTITL 

10 VAYTLGTANH MLKQLTFASI DNGAIALKVG 
TATGNINGHV DFQGNAGVIN LNDDIEIDGS 
AGAGDVSLSA SGNYSITEIQ GNGNNNLTFA 
I GANAAVGD I IINAGSVNFS NTLKS GN" I VI 
NHTPINITST LGNNNAIGTI EVANNDVTIT 

15 TTAGNNIHTL EVTDFDTGND G 1 1 GD AN1STRL 
NVKLNI EGGI TYDIiGSKIKS LANVQISEDT 
KNLDIPDALI DLDVLPRSLS LFNYFTDIKA 
KFNDNAWLTQ EIKNANIIEI ASDKFMLLQK 
IVLDLANYEL KYTGNVTHNG LIiTI ITY"FDT 

20 HSDITNITSD TKHQIVKLET GAIYTPVPQT 
GRDDTGGRDD TRGRGNTDNG CRDNCDVGNI 
ILDYTKNNYV ASGIANQLIN HVKDFGNTTD 
GLNEGWGLN GIEVENFLTD IAINMDISTFTA 
NLKRLNTNNQ AIIAAGDEDN IVTGIWGMSF 

25 DNSIVIGAAY TMADSKVKHK NDKNGDRTKA 
IKNYEKRITT ITDQIAIGKF INTFYSYELL 
ENNTTFQMLS IKKNYYDKFE TILGLNSVTH 
LDGIDEPLTT IRFKPAKITY NLGGGISTKN 



KGVLSLGTGS NX TGQI GTNS AALEKINIGA 720 

VNSNIITTAG NNSGKLIFTG NGGITGNIGA 780 

AANWITGLT TGALKYKDTG T 1 1 AHGGLVG 84 0 

GTLDFIGDGN VTQNI GADNA NSISTINIQG 900 

LTTHNIDFGA NGGTLEFNGN NT YNLNAI IV 960 

IGTPQNFTIQ VNNKNLTLVS SWSSINFGD 1020 

LDSNGNNLTI SGNNGIKLGS KGNELSSLNI 108 0 

LTSAKIKNIN IGTVAGGATY TLDAINDNFD 1140 

TSABDPGNNQ FGIIKIiITDT NKLTIDNNGM 1200 

INVENVTLNI KDIELNEVNA NVIiFNKNTTY 1260 

VTSTGNVNGT LNFNGSGKVT GLINNIVMLQ 132 0 

ANSHLTTDIN KTGGQDLNLV FINGGSVSGS 138 0 

SDGATMQVNN MVTATDISGK NANNGTLKLN 1440 

GTLQAQNIHF SNATQAATIiT LGAASQVTNJ 150P . 

KSIELTGNGT VTINSPHVYS " SITTANNAQG 1560 

TIRGDVYSKY LNIDAGKTIN FDRGDNNMNP 162 0 

DNLNFADDTA TANFKDAWI DAHIDNGGIL 168 0 

NIKAATLIAD NANLVLLDNV EVNTNLNVRD 1740 

AL.QKGGHILV SQGSNVDMSD LDNLIIKIKA 1800 

KVIIDASEEQ NKFVKWVADA NGLVLLTDTG 1860 

SNNSSNEAGG SSSDKNYGIT DWPIFDPSP 192 0 

AGKLLNDLGF MS PNRVTETTj DRLSNRINVN 198 0 

KEIGNRLEEL SDANTVNGLM RTNTLLNNKI 2040 

YGKI KQNS KM S AS GYQSNTG GGIIGFDYNI 2100 

KSNIYSIYGL YNWLTNNFFV EAIGVYGRNK 2160 

GGYNYLISHR TTITPMFGMR YATFKNNGYK 2220 

YLSQDIIIKP EliHWFINYQC KNKLPNIDAR 228 0 

NMIEFGIRYN LSLAKKYTAH QGSLKIKVNL 2340 



30 <212> Type : PRT 

<211> Length z 2340 

SequenceName : SEQ ID 3_V1 
SequenceDescription : 

35 Sequence 



<213> Organ! smName t Rickettsia prowazekii 
<400> PreSequenceString r 

MAQKPNFLKK IISAGLVTAS TATIVAGFSG VAMGAAMQYN RTTNAAATTF DGIGFDQAAG 60 

40 ANIPVAPNSV ITANANNPIT FNTPMGHXNS IiFLDTANDLA VTINEDTTLG FITNIAQQAK 120 

FFNFTVAAGK ILNITGQGIT VQEASNTXNA QNALTKVHGG AAINANDLSG LGSITFAAAP 180 

SVLEFNLINP TTQEAPLTLG ANSKIVNGGN GTLNITNGFI QVSDNTFAGI KTINIDDCQG 240 

LMFNSTPDAA NTLNLQVGGN TINFNGXDGT GKLVLVSKMG AATEFNVTGT LGGNLKGIIE 3 00 

LNTAAVAGKL ISQGGAANAV I GTDNGAGRA AGFIVSVDNG NAATISGQVY AKNMVIQSAN 3 60 

45 AGGQVTFEHI VDVGLGGTTN FKTADSKVII TENSNFGSTN FGNLDTQIW PDTKILKGNF 420 

I GD VKNNGNT AGVI TFNANG ALVSASTDPN IAVTNINAIE AEGAGWELS GIHIAELRLG 48 0 

NGGS I FKLAD GTVINGPVNQ NALMNNMALA AGS I QLDGS A 1 1 TGDIGNGG VNAALQHITL 540 

ANDASKILAL DGANI I GANV GGAIHFQANG GTIKLTNTQN NIWNFDLDI TTDKTGWDA 600 

SSLTNNQTLT INGSIGTWA NTKTLAQLNI GSSKTILNAG DVAINELVIE NNGSVQLNHN" 660 

50 TYLITKTINA ANQGQIIVAA DPLNTNTTLA DGTNLGSAEN PLSTIHFATK AANADSILNV 72 0 

GKGVNLYANN ITTNDANVGS IiHFRSGGTSI VSGTVGGQQG HKLNNLILDN GTTVKFLGDT 780 

TFNGGTKIEG KSILQISNNY TTDHVESADN TGTLEFVNTD PITVTLNKQG AYFGVLKQVI 840 

ISGPGNIVFN EIGNVGIVHG IAANSISFEN ASLGTSLFLP SGTPLDVLTI KSTVGNGTVD 900 

NFNAPIVWS GIDSMINNGQ IIGDKKNTIA LSLGSDNSIT VNANTLYSGI RTTKNNQGTV 960 

55 TLSGGMPNNP GTIYGLGLEN GSPKLKQVTF TTD YNNLGS I IANNVTINDY VTLTTGGIAG 102 0 

TDFDAKITLG SVNGNANVRF VDSTFSDPRS MIVATQANKG TVTYLGNALV SNIGSLDTPV 1080 

ASVRFTGNDS GAGLQGNIYS QNIDFGTYNL TILNSNVILG GGTTAINGEI DLLTNNL I FA 114 0 

NGTSTWGDNT SISTTLNVSS GNIGQWIAE DAQVNATT T G TTTIKIQDNA NANFSGTQAY 1200 

TLIQGGARFN GTLGAPNFAV TGSNIFVKYE LIRDSNQDYV LTRTNDVLNV VTTAVGNSAI 1260 

60 ANAPGVSQNI SRCLESTNTA AYNNMLLAKD PSDVATFVGA IATDTSAAVT TVNLNDTQKT 132 0 

QDLLSNRLGT LRYL SNAETS DVAGSATGAV SSGDEAEVSY GVWAKPFYNI AEQDKKGGIA 13 8 0 

GYKAKTTGW VGLDTLASDN LMIGAAIGIT KTDIKHQDYK KGDKTDINGL SFSLYGSQQL 1440 

VKNFFAQGNA IFTLNKVKSK SQRYFFESNG KMSKQIAAGN YDNMTFGGNL IFGYDYNAMP 15 0 0 

NVLVT PMAGL SYLKSSNENY KETGTTVANK RINSKFSDRV DL I VGAKVAG STVNITDIVI 1560 

65 YPEIHSFWH KVNGKIiSNSQ SMLDGQTAPF I S Q PDRTAKT SYNIGLSANI KSDAKMEYGI 162 0 

GYDFNSASKY TAHQGTLKVR VNF 1643 
<212> Type : PRT 
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<211> Length : 1643 

SequenceName : SEQ ID 172 
SequenceDe script ion : 

5 Sequence 

<213> Organ ismName : Porphyromonas gingivalis W83 
<400> PreSequenceString : 

MARIILEAHD VWEDGTGYQM LWDADHNQYG ASIPEESFWF ANGTI PAGLY DPFEYKVPVN 60 

10 ADAS FS PTNF VLDGTAS AD I PAGTYDYVII NPNPGIIYIV GEGVSKGNDY WEAGKTYHF 120 

TVQRQGPGDA ASWVTGEGG NEFAPVQNLQ WSVSGQTVTL TWQAPASDKR TYVLNESFDT 18 0 

QTLPNGWTMI DAD GDGHNWL STINVYNTAT HTGDGAMFSK SWTASSGAKI DLSPDNYLVT 24 0 

PKFTVPENGK LSYWVSSQEP WTNEHYGVFL S TTGNEAANF TIKLLEETIiG SGKPAPMNLV 3 00 
KSEGVKAPAP- YQERTIDLSA YAGQQVYLAF RHFGCTGIFR LYLDDVAVSG EGSSNDYTYT „ 360 

15 VYRDNWIAQ NLTATTFNQE NVAPGQYNYC VEVKYTAGVS PKVCKDVTVE GSNEFAPVQN 42 0 

LTGSAVGQKV TLKWDAPNGT PNPNPGTTTL SESFENGIPA SWKTIDADGD GNNWTTTPPP 48 0 

GGSSFAGHNS AICVSSASYI NFEGPQNPDN YLVTPELSLP NGGTLTFWVC AQDANYASEH 540 

YAVYASSTGN DASNFANALL EEVLTAKTW TAP EAI RGTR VQGTWYQKTV QLPAGTKYVA 60 0 

FRHFGCTDFF WINLDDVEIK ANGKRADFTE TFESSTHGEA PAEWTTIDAD GDGQGWLCLS 660 

20 SGQLGWLTAH GGTNWASFS WNGMALNPDN YLISKDVTGA TKVKYYYAVN DGFPGDHYAV 72 0 

MIS KTGTNAG DFTWFEETP NGINKGGARF GLSTEANGAK PQSVWIERTV DLPAGTKYVA 780 

FRHYNCSDLN YILLDDIQFT MGGSPTPTDY TYTVYRDGTK IKEGLTETTF EEDGVATGNH 84 0 

EYCVEVKYTA GVSPKECVNV TVDPVQFNPV QNLTGSAVGQ KVTLKWDAPN GTPNPNPGTT 90 0 

TLSESFENGI PASWKTIDAD GDGNNWTTTP PPGGTSFAGH NSAICVSSAS YINFEGPQNP 960 

25 DNYLVTPELS LPNGGTLTFW VCAQDANYAS EHYAVYASST GNDASNFANA LLEEVLTAKT 102 0 

WTAPEAIRG TRVQGTWYQK TVQLPAGTKY VAFRHFGCTD FFWINLDDVE IKANGKRADF 108 0 

TETFESSTHG EAPAEWTTID ADGDGQGWLC LSSGQLDWLT AHGGTNWAS FSWNGMALNP 1140 

DNYLI S KDVT GATKVKYYYA VNDGFPGDHY AVMISKTGTN AGDFTWFEE TPNGINKGGA 1200 

RFGLSTEANG AKPQSVWIER TVDLPAGTKY VAFRHYNCSD LNYILLDDIQ FTMGGSPTPT 1260 

30 DYTYTVYRDG TKIKEGLTET TFEEDGVATG NHEYCVEVKY TAGVSPKECV NVTVDPVQFN 132 0 

PVQNLTGSAV GQKVTLKWDA PNGTPNPNPG TTTLSESFEN GIPASWKTID ADGDGNNWTT 13 8 0 

TPPPGGTSFA GHNSAICVSS ASYINFEGPQ NPDNYLVTPE LSLPNGGTLT FWVCAQDANY 1440 

AS EHYAVYAS STGNDASNFA NALLEEVLTA KTWTAPEAI RGTRVQGTWY QKTVQLPAGT 150 0 

KYVAFRHFGC TDFFWINLDD VEIKANGKRA DFTETFESST HGEAPAEWTT IDADGDGQGW 1560 

35 LCLSSGQLGW LTAHGGTNW ASFSWNGMAL NPDNYLISKD VTGATKVKYY YAVNDGFPGD 1620 

HYAVMISKTG TNAGDFTWF EETPNGINKG GARFGLSTEA NGAKPQSVWI ERTVDLPAGT 1680 

KYVAFRHYNC SDLNYILLDD IQFTMGGSPT PTDYTYTVYR DGTKIKEGDT ETTFEEDGVA 1740 

TGNHEYCVEV KYTAGVSPKK CVNVTINPTQ FNPVQNLTAE QAPMSMDAIL KWNAPASKRA 180 0 

EVLNEDFENG IPASWKTIDA DGDGNNWTTT PPPGGSSFAG HNSAICVSSA SYINFEGPQN 1860 

40 PDNYLVTPEL SLPGGGTLTF WVCAQDANYA SEHYAVYASS TGNDASNFAN ALLEEVLTAK 192 0 

TWTAPEAIR GTRVQGTWYQ KTVQL PAGTK YVAFRHFGCT DFFWINLDDV VITSGNAPSY 198 0 

TYTIYRNNTQ IASGVTETTY RDPDLATGFY TYGVKWYPN GESAIETATL NITS L ADVTA 2 040 

QKPYTLTWG KTITVTCQGE AM I YDMNGRR LAAGRNTWY TAQGGHYAVM VWDGKSYVE 210 0 

KLAVK 2105 

45 <212> Type : PRT 

<211> Length : 2105 

SequenceName : SEQ ID 173 
SequenceDescription : 

50 Sequence 



<213> Organi smName : Porphyromon* 
<400> PreSequenceString : 
MKTSERILSY FFLLCAVFSL GSCEGLYAQV 

55 EDSNGIKVNV KLADGVEYW GTAWSVTQG 
LGTIVKLTIK RRAVCTAWSM AINAAETGFV 
KQPAPQWKQ IGETIVREFS ITNGSQNPTQ 
TYADLTPTVT NGKVRIYTLS GSSLGPDHLL 
IDSQCEIKTT AAT I TMAAGA ANITGYSVTG 

60 AAFNINTIGR NDYYRPRGFV LHEFIDVKVN 
VGLDDVDGDG FYDDLPVGAT ITITVTVRLK 
RTSWIDPNTW FNLS STHL YLi SRESVQDASH 
YANPNTRYW EIVFPQGMTM PPKSDIEWTN 
TIVSPSQERG FVTLHGVKYD CTNNHEMWE 

65 VLGCDPPCGR GAETSVPKIE RADNSLGWTD 
TSIQHGTASS LGARFVLATG VDRVETLTPL 
AEQVIDWDFT SILPAGGLLD RDKVDWTRY 



.s gingivalis W83 

TFPNYSPTAA SSIAVCSGEE TLIIDFTWQ 60 

NAVTVAETNV SNPNEPVFTV KSADGNNWE 120 

FKDKVTVTIG DHSDSKESNS YSVNYPNLTI 18 0 

TVYLSIEYPD EAYLTGVGAM TLQAKLGASG 240 

TNGEIIYLKE TFKLKTCAPV TVYRVGWGCS 3 00 

PDYRSPTFSL CQPFELTIKF SNSGAGGSMG 3 60 

GKPVTNFKTD GSELDLRFDG QFTEDPDGPG 420 

CDQFTACNNA PNDLSDRGLI LKTLYQTS CD 480 

MPTVIEKDTP FDLKIMTSYY SILSSYNNIW 540 

IKNHPIDGSL VFTPPINLPD ANI TTSGNTM 60 0 

YKIREVFNYL HFPDCLCPVG PIMCNTAKRY 660 

YTMRTRQSRS NI S AYDLAKA LYMDEVNITA 720 

SADIKIFRDG VQIVSVDGYT TFRS IRRNNN 7 80 

RVTSQNAHRV DTQVGREWFF YNSTANVSPI 840 
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WDEANPLTCL ILVPEIYIMG TFWNGTDPH VISQCTPTDIi GRVANHYARR FGSGAFEYAN 90 0 

EYRPGVKIRN IYLKVPKSYT LNRVEYSNHR NHSSLGTTMP FEE INHTDVT SQGEYNIYKY 960 

QLADNEKAHF NITVKNAYGA ALKVNVSPTC ASSAVATNYD KISYYVDYID YYYYAATQPT 1020 

VPNSLDIVAD QSAGSNGIYS VSALMVYNRP IL YTNKPS X A LWQSGEVEL VGKTGEWKLR 10 8 0 

ISNPSSATAP YVWLALPTTS GLTIEKVTDA AGTEMAFTTY SGGKMYRLSE AGVPVGSALD 1140 

YTIHFTYSGC SPIALKAMGG WNCSAYPLSL DEYVCSSQVT DLKLKPLPAA MELTEIAVPD 120 0 

PTAAATLCST LEYIYSIQST DNANVYS PTF SIFPEEGLW TPNQVQVEYP AGS GNWAALN 1260 

WNNSVNLLQ HPALTTIGYL KGLKEGESND NQRKILVKFY IKTECSFVSG KNFRVRADGR 132 0 

NACNQNAKGS GLAISTPPIR INGAIEPYTT SASTQLVTTT TSQSDCKAPK RVKWQTWG 13 80 

GETTPKAYLE ITLPLGFKYV TGSYAPDNTH PGGVNASPAG TEEVTLTANG EDKIKINVKA 1440 

GLTSGQSFAY TLEMKEDDDN VPACGNHTIE I VNVEE I EGL WCEGVQCAET LWTGANKFE 1500 

FELDKPYLDI TVISAVSTFS GGKENLTIEY KVSNTSTTQP LKPGAWTLF SDKDNNQVFS 15 60 

GGDVAVATQE LVAEITNTTP LTQIMKVKGV S S S HTGNIA/Xi TILPKDGCYC EIKSPMVTLN 1620 

HLPSNTOIGG TVGKPNEWKE BNNWTNDQVP DAAEDVE FAT EVNNPTDPNN PKSGPAKENL 1680 

HLDDIHQNGT AGRVIGNLIN DSDKDLVITT GNQLTINGW EDNNPNVGTI WKSSKDNPT 1740 

GTLLFANPGN NQNVGGTVEF YNQGYDCADC GMYRRSWQYF GIPVNESDFP YDHVDGNATV 18 00 

NQWVEPFNGD KWRPAPYAPD TKLQRFKGYQ ITNDVQAQPT GVYSFKGTLC VCDAFIiNLTR I860 

TSGVNYSGAN LIGNSYTGAI DIKQGIVFPP E VEQTVYLi FN TGTRD QWRKL NGS TVS GYRA 1920 

GQYLSVPKNT AGQDNLPDRI PSMHSFLVKM QNGASCTLQI LYDKLLKNTT WNGNGTQIT 1980 

WRSGNSGSAN MPSLVMDVLG NESADRLWIF TDGGLSFGFD NGWDGRKLTE KGL SQL YAMS 2040 

D I GNDKFQVA GVPELNNLLI GFDADKDGQY TLEFALSDHF AKGGVFLEDLi SRGVTRRWD 210 0 

GGSYSFDAKR GDSGARFRLS YDEEWVESAE VSVLVGTAGK RIVITNNSEH ACQ ANVYT TD 2160 

GKLL IRLDVK PGSKSMTEPL VDGVYWSLQ SPATSSNVRK VWN 2204 
<212> Type : PRT 
<211> Length : 2204 

SequenceName : SEQ ID 174 

SequenceDescription : 

Sequence 



<213> Organ! sraName r Porphyromonas gingival is W83 
<400> PreSequenceString : 

MNKFYKSLLQ SGBAAFVSMA TALTASAQIS FGGEPLSF-SS RS AGTH S FDD AMTIRLTPDF 60 

NPEDLIAQSR WQSQRDGRPV RIGQVIPVDV DFASKASHIS SIGDVDVYRL QFKLEGAKAI 120 

TLYYDAFNIP EGGRLYIYTP DHEIVLGAYT NATHRRNGAF ATEPVPGSEL IMDYEVSRGG 180 

TLPDIKISGA GYIFDKVGGR PVTDNHYGIG EDDSDSDCEX NINCPEGADW QAEKNGWQM 240 

IMVKGQYXSM CSGNLLNNTK GDFTPLIISA GHCAS ITTNF GVTQS ELDKW IFTFHYEKRG 300 

CSNGTLAXFR GNSIIGASMK AFLPIKGKSD GLLLQLNDEV PLRYRVYYNG WDSTPDIPSS 360 

GAGIHHPAGD AMKI SILKKT PALNTWISSS GSGGTDDHFY FKYDQGGTEG GSSGSSLFNQ 420 

NKHWGTLTG GAGNCGGTEF YGRLNSHWNE YASDGNTSRM DIYLDPQNNG QTTILNGTYR 480 

DGYKPLPSVP RLLLQS TGDQ VELNWTAVPA DQYPSSYQVE YHIFRNGKEI ATTKELSYSD 540 

AIDESIIGSG I IRYEVSARF IYPSPLDGVE SYKDTDKTSA DLAIGDIQTK LKPDVTPLPG 600 

GGVSLSWKVP FLSQLVSRFG ESPNPVFKTF EVPYVSAAAA QTPNPPVGW IADKFMAGTY 660 

PEKAAIAAVY VMPSAPDSTF HLFLKSNTNR RLQKVTTE>SD WQAGTWLRIN LDKPFPVNND 72 0 

HMLFAGI RMP NKYKLNRAIR YVRNPDNLFS ITGKKI SYUN GVS FEGYGI P SLLGYMAIKY 7 80 

LWNTDAPKI DMSLVQEPYA KGTNVAPFPE LVGIYVYKNG TFIGTQDPSV TTYSVSDGTE 840 

SDEYEIKLVY KGSGISNGVA QIENNNAWA YPSWTDRF S IKNAHMVHAA ALYSLDGKQV 9 00 

RSWNNLRNGV TFSVQGLTAG TYMLVMQTAN GPVSQKIVKQ 940 
<212> Type : PRT 
<211> Length : 940 

SequenceName : SEQ ID 175 

SequenceDescription : 

Sequence 



<213> OrganismName : Porphyromonas gingivalis W83 
<400> PreSequenceString : 

MKNLNKFVSI ALCSSLLGGM AFAQQTELGR NPNVRLLEST QQSVTKVQFR MDNLKFTEVQ 60 

TPKGMAQVPT YTEGVNLSEK GMPTLPILSR SLAVSDTREM KVEWSSKFI EKKNVLIAPS 120 

KGMIMRNEDP KKIPYVYGKS YSQNKFFPGE IATLDDPFIL RDVRGQWNF APLQYNPVTK 180 

TLRIYTEITV AVSETSEQGK NILNKKGTFA GFEDTYKRMF MNYEPGRYTP VEEKQNGRMI 240 

VIVAKKYEGD IKDFVDWKNQ RGLRTEVKVA EDIASPVTAN AIQQFVKQEY EKEGNDLTYV 3 00 

LLIGDHKDIP AKITPGIKSD QVYGQIVGND HYNEVFIGRF SCESKEDLKT QIDRTIHYER 3 60 

NITTEDKWLG QALCIASAEG GPSADNGESD IQHENVIANL LTQYGYTKII KCYDPGVTPK 42 0 

NIIDAFNGGI SLANYTGHGS ETAWGTSHFG TTHVKQL.TNS NQLPFIFDVA CVNGDFLFSM 48 0 

PCFAEALMRA QKDGKPTGTV AIIASTINQS WASPMRGQDE MNEILCEKHP NNIKRTFGGV 540 

TMNGMFAMVE KYKKDGEKML DTWTVFGDPS LLVRTLVPTK MQVTAPAQIN LTDASVNVSC 60 0 
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DYNGAI ATI S ANGKMFGSAV VENGTATINL TGLTNESTLT LTWGYNKET VIKTINTNGE 660 

PNPYQPVSNL TATTQGQKVT LKWDAPSTKT NATTNTARSV DGIREtiVLLS VSDAPELLRS 720 

GQAEIVLEAH DVWNDGSGYQ ILLDADHDQY GQVIPSDTHT LWPNCSVPAN LFAPFEYTVP 78 0 

ENADPSCSPT NMIMDGTASV NIPAGTYDFA IAAPQANAKI WIAGQGPTKE DDYVFEAGKK 840 

5 YHFLMKKMGS GDGTELTISE GGGSDYTYTV YRDGTKIKEG LTATTFEEDG VAAGNHEYCV 900 

EVKYTAGVSP KVCKDVTVEG SNEFAPVQNL TGSAVGQKVT LKWDAPNGTP NPNPNPNPNP 960 

NPGTTTLSES FENGIPASWK TIDADGDGHG WKPGNAPGIA GYNSNGCVYS ESFGLGGIGV 102 0 

LTPDNYLITP ALDLPNGGKL TFWVCAQDAN YAS EHYAVYA SSTGNDASNF TNALLEETIT 108 0 

AKGVRSPEAI RGRIQGTWRQ KTVDLPAGTK YVAFRHFQST DMFYIDLDEV EIKANGKRAD 1140 

10 FTETFESSTH GEAPAEWTTI DADGDGQGWL CLSSGQLDWL TAHGGTNWS SFSWNGMALN 12 00 

PDNYLISKDV TGATKVKYYY AVNDGFPGDH YAVMISKTGT NAGDFTWFE ETPNGINKGG 12 60 

ARFGL STEAD GAKPQSVWIE RTVDLPAGTK YVAFRHYNCS DLNYILLDDI QFTMGGSPTP 1320 

TDYTYTVYRD GTKIKEGLTE TTFE EDGVAT GNHEYCVEVK YTAGVSPKKC VNVTVNSTQF 1380 

NPVKNtiKAQP DGGDWkKWE &PSAKKTEGS REVKRIGDGL FVTIEPANDV RAWAKVVLiA 1440 

15 ADNVWGDNTG YQFLLDADHN TFGSVIPATG PLFTGTASSD LYSANFEYL I PANADPWTT 1500 

QNIIVTGQGE WI PGGVYD Y CITNPEPASG KMW I AGDGGN QPARYDDFTF EAGKKYTFTM 1560 

RRAGMGD GTD MEVEDDS PAS YTYTVYRDGT KIKEGLTETT YRDAGMSAQS HE Y CVEVKYT 1620 

AGVSPKVCVD YIPDGVADVT AQKPYTLTW GKTITVTCQG EAMIYDMNGR RLAAGRNTW 1680 

YTAQGGYYAV MVWDGKSYV EKLAIK 17 06 

20 <212> Type : PRT 

<211> Length : 1706 

SequenceName : SEQ ID 176 
SequenceDe script ion : 

25 Sequence 



<213> OrganistnName : Porplxyrqmon; 
<400> PreSequenceString : 
MKRKPLFSAL VILSGFFGSV HPAS AQ KVPA 

30 DIWADLNGNG KYDSGERLDS GEFRDVEFRQ 
NCTGLTAFDC FANLLTELDL SKACTGLTFVN 
NLSTYTKLKE LYVGDNGLTA LDLSANTLLE 
TGLDVAANKE LKILHCNNNQ LTALNLSANT 
NKLTKLDV3A NANLIALQCS NNQLTALDLS 

35 TEGEGRFVPY NDDEGGEEEN VCTTEHVEMA 
SVYAQDGILY LSGMEQGLPV QVYTVGGSMM 
TAMP 

<212> Type : PRT 
<211> Length : 484 
40 SequenceName : SEQ ID 177 

SequenceDescription : 



s gingival is W83 

PVDGERI I ME LSEADVECTI KIEAEDGYAN 60 

TKAIVYGKMA KFL FRGS SAG DYGATFIDIS 120 

CGKNQLTKLD LPANADIETL NCSKNKITSL 180 

ELVYSNNEVT TINLSANTNL KSLYCINNKM 240 

KLTTLSFFNN ELTNIDLSDN TALEWLFCNG 3 00 

KTPKLTTLNC YSNRIKDTAM RALIESLPTI 360 

KAKNWKVLTS WGEPFPGITA LISXEGESEY 420 

YSSVASGSAM EIQLPRGAAY WRIGSHAIK 480 

484 



Sequence 



45 <213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MKRAITLFAV LLMGWSVNAW S FAC KTANGT AIPIGGGSAN VYVNIxAPWN VGQNLWDLS 60 

TQIFCHNDYP ETITDYVTLQ RGSAYGGVX.S NFS GTVKYS G SSYPFPTTSE TPRWYNSRT 120 

DKPWPVALYL TPVSSAGGVA IKAGSLIAVL ILRQTNNYNS DDFQFWNIY ANNDWVPTG 180 

50 GCDVSARDVT VTLPDYPGSV PIPLTVYCAK SQNLGYYLSG TTADAGNS I F TNTASFSPAQ 240 

GVGVQLTRNG TIIPANNTVS LGAVGTSAVS LGLTANYART GGQVTAGNVQ SIIGVTFVYQ 3 00 

<212> Type : PRT 
<211> Length : 300 
55 SequenceName : SEQ ID 178 

SequenceDescription : 



Sequence 



60 <213> OrganismName : Shigella flexneri 2a str. 2457T 
<4 00> PreSequenceString : 

MGIKQHNGNT KADRLAELKI RSPSIQLIKF GAIGLNAIIF SPLIilAADTG SQYGTNITIN 60 

DGDRITGDTA DPSGNLYGVM TPAGNTPGNI NLGNDVTVNV NDAS GYAKGI IIQGKNSSLT 120 

ANRLTVDWG QTSAIGINLI GDYTHADLGT GSTIKSNDDG IIIGHSSTLT ATQFTIENSN 180 

65 GIGLTINDYG TSVDLGSGSK IKTDGSTGVY IGGLNGNNAN GAARFTATDL TIDVQGYSAM 240 

GINVQKNSW DLGTNSTIKT NGDNAHGLWS FGQVSANALT VDVTGAAANG VEVRGGTTTI 300 

GADSHISSAQ GGGLVTSSSD ATINFSGTAA QRNSIFSGGS YGASAQTATA VINMQNTDIT 3 60 
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VDRMGSLALG LWALSGGRIT GDSLAITGAA GARGIYAMTN SQIDLTSDLV IDMSTPDQMA 42 0 

IATQHDDGYA ASRINASGRM LINGSVL.SKG GLINLDMHPG SVWTGSSLSD NVNGGKLDVA 480 

MNNSVWNVTS NSNLDTLALS HSTVDFASHG STAGTFTTLN VENLSGNSTF IMRADWGEG 540 

NGVNNRGDLL NISGSSAGNH VLAIRNQGSE ATTGNEVLTV VKTTDGAASF SASSQVELGG 60 0 

5 YLYDVRKNGT NWELYASGTV PEPTPNPEPT PAPAQPPIVN PDPTPEPAPT PKPTTTADAG 660 

GNYLNVGYLL NYVENRTLMQ RMGDIiRMQS K DGNIWLRSYG GSLDSFASGK LSGFDMGYSG 72 0 

IQFGGDKRLS DVMPLYVGLY IDSTHAS PDY SGGDGTARSD YMGMYASYMA QNGFYSDLVI 78 0 

KASRQKNSFH VLDSQNNGVN ANGTANGMSI SLEAGQRFNL SPTGYGFYIE PQTQLTYSHQ 840 

NEMAMKASNG LNIHLNHYES LLGRASMILG YDITAGNSQL NVYVKTGAIR EFSGDTEYLL 900 

10 NDSREKYSFK GNGWNNGVGV SAQYNKQHTF YLEADYTQGN LFDQKQVNGG YRFSF 955 



<212> Type : PRT 
<211> Length : 955 

SequenceNarae : SEQ ID 179 
15 SequenceDe script ion : 

Sequence 



<213> OrganismName : Shigella, flexneri 2a str. 2457T 

20 <400> PreSequenceString : 

MS KFVKTAI A AAMVMGVFTS TATIAAGNNG TARFYGTIED SVCSIVPDDH KLEVDMGDIG 60 
AEKLKNNGTT TPKSFQIRLQ DCVFDTQETM TTTFTGTVSS ANSGNYYTIF NTDTGAAFNN 120 
VSLAIGDSLG TSYKSGMGID QKIVKDTSTN KGKAKQTLNF NAWLVGAADA PDLGNFEANT 180 
TFQITYL 187 

25 <212> Type : PRT 

<211> Length : 187 

SequenceNarae : SEQ ID 18 0 
SequenceDescription : 

30 Sequence 

<213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MKXKTLAIW LSALSLSSAA ALADTTTWG GTIHFKGEW NAACAVDAGS VDQTVQLGQV 60 
35 RTASLKQAGA TSSAVGFNIQ LNDCDTXVAT KAAVAFLGTA IDATRTDVLA LQSSAAGSAT 120 

NVGVQ ILDRT GNALTLDGAT FSAQTTLNNG TNTIPFQARY YAIGEATPGA ANADATFKVQ 180 

YQ 182 

<212> Type r PRT 

<211> Length : 182 
40 SequenceName : SEQ ID 181 

SequenceDescription : 

Sequence 



45 <213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MASISSLGVG SGLDLSSILD SLTAAQKATL TPISNQQSSF TAKLSAYGTL KSALTTFQTA 60 
NTALSKADLF SATSTTSSTT AFSATTAGNA IAGKYTISVT HLAQAQTLTT RTTRDDTKTA 120 
IATSDSKLTI QQGDDKDPIT IDISAAJSTSSL SGIRDAINNA KAGVSASIIN VGNGEYRLSV 180 

50 TSNDTGLDNA MTLSVSGDDA LQSFMGYDAS ASSNGMEVSV AAQNAQLTVN NVAIENSSNT 240 
ISDALENITL NLNDVTTGNQ TLTITQDTSK VQTAIKDWVN AYNSLIDTFS SLTKYTAVDA 300 
GADSQSSSNG ALLGDSTLRT IQTQLKSMLS NTVSSSSYKT LAQIGITTDP SDGKLELDAD 360 
KLTAALKKDA SGVGALIVGD GKKTGITTTI GSNLTSWLST TGI I KAATDG VSKTLNKLTK 42 0 

DYNAASDRID AQVARYKEQF TQLDVLMTSL NSTSSYLTQQ FENNSNSK 468 

55 <212> Type : PRT 

<211> Length : 468 

SequenceName : SEQ ID 182 
SequenceDescription : 

60 Sequence 



<213> OrganismName : Shigella, flexneri 2a str. 2457T 
<400> PreSequenceString : 

MEGKADNWL ENGGRLDVLT GHTATNTRVD DGGTLDVRNG GTATTVSMGN GGVLLADSGA 60 

65 AVSGTRSDGK AFSIGGGQAD ALMLEKGSSF TLNAGDTATD TTVNGGLFTA RGGTLAGTTT 120 

LNNGAILTLS GKTVNNDTLT IREGDALLQG GALTGNGSVE KSGSGTLTVS NTTLTQKAVN 180 

LNEGTLTLND STVTTDVIAQ RGTALKXiTGS TVLNGAIDPT NVTLASGATW NIPDNATVQS 240 
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WDDLSHAGQ IHFTSTRTGK FVPATLKVKN LNGQNGTISL RVRPDMAQNN ADRLVIDGGR 3 00 

ATGKTILNLV NAGNSASGLA TSGKGIQWE AINGATTEEG AFIQGNKLQA GAFNYSLNRD 3 60 

SDESWYLRSE NAYRAEVPLY ASMLTQAMDY DRILAGSRSH QTGVS GENNS VRLSIQGGHL 42 0 

GHDNNGGIAR GATPESSGSY GFVRLEGDLL RTEVAGMSVT AGVYGAAGHS SVDVKDDDGS 480 

5 RAGTVRDDAG SLGGYLNLIH NASGLWADIV AQGTRHSMKA SSDNNDFRVR GWGWLGSLET 540 

GLPFSITDNL MLEPQLQYTW QGLSLDDGQ3D NASYVKFGHG SAQHVRAGFR LGSHHDMNFG 60 0 

KGTS SRDTLR GSAKHSVREL PVNWWVQPSV IRTFSSRGDM SMGTAAAGSN MTFSPSQNGT 660 

SLDLQAGLEA RVRENITLGV QASYAHSIKTG SSAEGYNSQA TLNVTF 7 06 
<212> Type : PRT 
10 <211> Length : 706 

SequenceName : SEQ ID 183 

SequenceDe script ion : 

Sequence ... 
15 

<213> OrganismName : Shigella f lexneri 2a str. 2457T 
<4 00> PreSequenceString : 

MAFSQAVSGL NAAATNLDVI GNNIANSATY" GFKSGTASFA DMFAGSKVGL GVKVAGITQD 
FTDGTTTNTG RGLDVAISQN GFFRLVDSUTG SVFYSRNGQF KLDENRNL.LN TQGLQLTGYP 

20 VTGTPPTIQQ GANPTNISIP NTLMAAKTTT TASMQINLNS SDPLPTVTPF SASNADSYNK 
KGSVTVFDSQ GNAHDMSVYF VKTGDNNWQV YTQDSSDPMS IAKTATTLEF NANGTLVDGA 
MANNIATGAI NGAEPATFSL SFLNSMQQKTT GANNIVATTQ NGYKPGDLVS YQINDDGTW 
GNNSNEQTQL LGQIVLANFA NNEGLASEGD NVWSATQSSG VALLGTAGTG NFGTLTNGAL 
EASNVDLSKE LVNMIVAQRN YKSNAQTIKT QDQILNTRVN LR 

25 <212> Type : PRT 

<211> Length : 402 

SequenceName : SEQ ID 184= 
SequenceDe script ion : 

30 Sequence 



<213> OrganismName r Shigella f lexneri 2a str. 2457T 

<r400> PreSequenceString : 

MKLVHMASGL AVAIALAACA DKSADIQTPA PAANTS I SAT QQPAIQQPNV- SGTVWIRQKV 60 
35 ALPPDAVLTV TLSDASLADA P S KVLAQKA.V RTEGKQSPFS FVLPFNPADV QPNARILLSA 120 

AITVNDKLVF ITDTVQPVTN QGGTKADLXIi. VPVQQTAVPV QASGGATTTV PSTSPTQVNP 180 

SSAVPAPTQY 190 

<212> Type : PRT 

<211> Length r 190 
40 SequenceName : SEQ ID 185 

SequenceDescription : 

Sequence 



45 <213> OrganismName : Shigella f lexneri 2a str. 2457T 
<400> PreSequenceString : 

MIIKKSGGRW QLSLLASWI SAFFLNTAYA WQQEYIVDTQ PGHSTERYTW DSDHQPDYND 
ILSQRIQSSQ RALGLEVNLA EETPVDVTSS MSMGWNFPLY EQVTTGPVAA LHYDGTTTSM 
YNEFGDSTTT LTDPLWHASV SSLGWRVDSR LGDLRPWAQI SYNQQFGENI WKAQSGLSRM 
50 TATNQNGNWL DVTVGADMLL NQNIAAYAAL TQAENTTNNS DYLYTMGVSA RF 

<212> Type : PRT 
<211> Length : 232 

SequenceName : SEQ ID 186 
55 SequenceDescription : 

Sequence 



<213> OrganismName : Shigella f lexneri 2a str. 2457T 
60 <400> PreSequenceString : 

MKWCKRGYVL AAML ALA SAT IQAADVTITV NGKWAKPCT VSTTNATVDL GDLYSFSLMS 
AGAASAWHDV ALELTNCPVG TSRVTASFSG AADSTGYYKN QGTAQNIQLE LQDDSGNTLN 
TGATKTVQVD DSSQSAHFPL QVRALTVNGG ATQGTIQAVI SITYTYS 
<212> Type : PRT 
65 <211> Length : 167 

SequenceName : SEQ ID 187 
SequenceDescription : 



60 
120 
180 
240 
300 
360 
402 



60 
120 
180 
232 



60 
120 
167 
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Sequence 

<213> OrganisrnName : Shigella flexneri 2a str. 2457T 
5 <400> PreSequenceString : 

MKRAPLITGL LLISTSCAYA SSGGCGADST SGATNYSSW DDVTVNQTDM VTGREFTSAT 60 
LSSTNWQYAC SCSAGKAVKL VYMVSPVLTT TGHQTGYYKL NDSLDIKTTL QANDIPGLTT 120 
DQWSVNTRF TQIKSSTVYS AATQTGVCQG DTSRYGPVNI GANTTFTLYV TKPFLGSMTI 180 
PKTD I AVI KG AWVDGMGSPS TGDFHDLVKL SIQGNLTAPQ SCKINQGDVI KVNFGFINGQ 240 

10 KFTTRNAMPD GFTPVDFDIT YDCGDTSKIK NSLQMRIDGT TGWDQYNLV ARRRS SDNVP 3 00 

DVGIRIENLG GGVANIPFQH GILPVDPSGH GTVNMRAWPV NLVGGELETG KFQGTAT I TV 360 
MVR 3 63 

<212> Type : PRT 
<211> Length : 363 

15 SequenceMame : SEQ ID 18 8 

SequenceDe script ion : 

Sequence 



20 <213> OrganisrnName : Siiigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MQKNAAHTYA ISSLLVLSLT GCAWIPSTPL VQGATSAQPV PGPTPVANGS IFQSAQPINY 60 

GYQPLFEDRR PRNI GDTLT I VLQENVSASK SSSANASRDG KTNFGFDTVP RYLQGLFGNA 120 

RADVEAS GGN TFNGKGGANA SNTFSGTLTV TVDQVLVNGN LHWGEKQ I A INQGTEFIRF 18 0 

25 SGWNPRTIS GSNTVPSTQV ADARIEYVGN GYINEAQNMG VJLQRFFLNLS PM 232 



<212> Type : PRT 
<211> Length : 232 

SequenceName : SEQ ID 189 
30 SequenceDescription : 

Sequence 



<213> OrganisrnName : Shigella flexneri 2a str. 2457T 

35 <400> PreSequenceString : 

MKRHLNTCYR LVWNHITGAF WAS ELARAQ GKRGGVAVAL SLAAVTSLPV LAADIWHPG 60 

ETVNGGTLVN HDNQFVSGTA DGVTVSTGLE LGPDSDENTG GQWIKAGGTG RNTTVTANGR 12 0 

QIVQAGGTAS DTVIRDGGGQ SLNGLAVNTT LDNRGEQWVH GGGKAAGT 1 1 NQDGYQTIKH 180 

GGLATGTIVN TGAEGGPESE MVS S GQMVG G TAESTTINKN GRQVIWSSGM ARDTLIYAGG 240 

40 DQTVHGEAHN TRLEGGNQYV HNGGTATETL INRDGWQVI K EGGTAAHTT I NQKESCR 297' 



<212> Type : PRT 
<211> Length : 297 

SequenceName : SEQ ID 190 
45 SequenceDescription : 

Sequence 



<213> OrganisrnName : Shigella flexneri 2a str. 2457T 
50 <400> PreSequenceString : 

MMMKTIKHLL CCAIAASALI STGVHAASWK DALSSAASEL GNQNSTTQEG GWSLASLTNL 60 
LSSGNQALSA DNMNNAAGIL QYCAKQKLAS VTDAENI KNQ VLEKLGLNSE EQKEDTNYLD 12 0 

GIQGLLKTKD GQQLNLDNIG TTPLAEKVKT KACDLVLKQG LNFIS 165 
<212> Type : PRT 
55 <211> Length : 165 

SequenceName : SEQ ID 191 
SequenceDescription : 

Sequence 
60 

<213> OrganisrnName : Siiigella flexneri 2a str. 2457T 
<400> PreSequenceString- : 

MFKGQKTLAA LAVS LLF TAP VYAADEGSGE IHFKGEVIEA PCEIHQDDID KEVELGQVTT 60 
SHINQSHHSD AVAVDLL LVN CDLENSSNGS GGKI SKVAVT FDS SAKTTGA DPILNNTSTG 120 
65 EATGVGVRLM NKDQSNIVLG TATPDIDLAP TSSEQTLNFF AWMEQIDQAT PVTPGAVTAN 18 0 

ATYVLDYK 188 
<212> Type : PRT 
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20 



<211> Length : 188 

SequenceName : SEQ ID 192 
SequenceDescription : 

Sequence 



<213> Organi sraName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MSAGSPKFTV RRIAALSLVS LWLAGCSDTS NPPAPVSSVN GNAPANTNSG MLITPPPKMG 60 
10 TTSTAQQPQI QPVQQPQIQA TQQPQIQPVQ PVAQQPVQME NGRIVYNRQY GNIPKGSYSG 12 0 

STYTVKKGDT LFYIAWITGN DFRDLAQRNN I QAP YALKTVG QTLQVGNASG TPITGGNAIT 180 
QADAAEQGW IKPAQNSTVA VASQPTITYS ESSGEQSA2STK ML PNNKPTAT TVTAPVTVPT 24 0 

ASTTEPTVSS TSTSTPISTW RWPTEGKVIE TFGASEGGNK GIDIAGSKGQ AI I ATADGRV 3 00 

VYAGNALRGY GNLIIXKHND DYLSAYAHND TMLVREQQEV KAGQKIATMG STGTSSTRLH 3 6.0 

15 FEIRYKGKSV NPLRYLPQR 3 79 

<212> Type : PRT 
<211> Length : 379 

SequenceName : SEQ ID 193 
SequenceDescription : 



Sequence 



<213> Organi sraName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

25 MIKFLSALIL LLVTTAAQAE RIRDLTSVQG VRQNSLIGYG LWGLDGTGD QTTQTPFTTQ 60 
TLNNMLSQLG ITVPTGTNMQ LKNVAAVMVT ASLPPFGRQG QTIDVWSSM GNAKS LRGGT 120 
LLMTPLKGVD SQVYALAQGN I LVGGAGAS A GGSSVQWQL NGGRI TNGAV IERELPSQFG 180 
VGNTLNLQLN DEDFSMAQQI ADTINRVRGY GSATALDART IQVRVPSGNS SQVRFLADIQ 240 
NMQVNVTPQD AKWINSRTG SWMNREVTL DSCAVAQG3STL SVTVNRQANV SQPDTPFGGG 3 00 

30 QTWTPQTQI DLRQSGGSLQ SVRSSASLNN WRALNALGA TPMDLMSILQ SMQSAGCLRA 3 60 

KLEII 3 65 

<212> Type r PRT 
<211> Length : 365 

SequenceName : SEQ ID 194 

35 SequenceDescription : 

Sequence 



<213> OrganismName r Shigella flexneri 2a str. 2457T 
40 <400> PreSequenceString : 

MKRSIIAAAV FSSFFMSAGV FAADVDTGTL TIKGNIAETSP CKFEAGGDSV SINMPTVPTT 60 
VFEGKAKYST YDDAVGVTSS MLKISCPKEV AGVKLS L Z TN DKITGNDKAI AS S NDTVGDN 120 
SDVLDVSAPF NIESYKTAEG QYAIPFKAKY LKL TDNS VQ S GDVLSSLVMR VAQD 174 

45 <212> Type : PRT 

<211> Length : 174 

SequenceName : SEQ ID 195 
SequenceDescription : 

50 Sequence 



<213> Organi smName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MAVQKNVI KG ILAGTFALML SGCVTVPDAI KGSSTTPQQD LVRVMSAPQL YVGQEARFGG 60 
55 KWAVQNQQG KTRLEIATVP LDSGARPTLG EPS RGRI YAD VNGFLDPVDF RGQLVTWGP 120 

ITGAVDGKIG NTPYKFMVMQ VTGYKRWHLT QQVIMPPQPI DPWFYGGRGW PYGYGGWGWY 18 0 

NPGPARVQTV VTE 193 

<212> Type : PRT 

<211> Length : 193 
60 SequenceName : SEQ ID 196 

SequenceDescription : 

Sequence 



65 <213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MRNKPFYLLC AFLWLAVSRV LAADSTITIR GYVRDNGCSV AAESTNFTVD LMENAAKQ FN 60 



10 



35 
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NIGATTPWP FRILLSPCGN AVSAVKVGFT GVADSHNANL LALENTVSAA AGLGIQLLNE 12 0 

QQNQIPLNAP SSAISWTTLT PGKPNTLNFY ARLMATQVPV TAGHINATAT FTLEYQ 176 

<212> Type : PRT 
<211> Length : 176 

SequenceName : SEQ ID 197 

SequenceDe script ion ; 

Sequence 



<213> OrganisraName : Shigella flexneri 2a str. 2457T 
<400 > PreSequenceString : 

MKKLTVAALA VTTLLSGSAF AHEAGEFFMR AGSATVRPTE GAGGTLGSLG GFSVTNNTQL 60 
. ■ GLTFTYMATD NIGVELLAAT P FRHKI GTRA TGDIATVHHL PPTLMAQWYF GDASSKFRPY 120 
15 VGAGINYTTF FDNGFNDHGK EAGLSDLSLK DSWGAAGQVG VDYLINRDWL VNMSWYMDI 180 
DTTANYKLGG AQQHDSVRLD PWVFMFSAGY RF 212 
<212> Type : PRT 
<211> Length : 212 

SequenceName : SEQ ID 198 
20 SequenceDescription : 

Sequence 



<213> OrganisraName : Shigella flexneri 2a str. 2457T 

25 <4 00> PreSequenceString : 

MFFKRGKILS AGRLNKKSLG IVMFLSVGLL LAGCSGSKSS DTGTYSGSVY TVKRGDTLYR 60 
ISRTTGTSVK ELARLNGISP PYTIEVGQKL KLGGAKSSSS TRKSTAKSTT KTASVTPSSA 12 0 

VPKSSWPPVG QRCWLWPTTG KVIMPYSTAD GGNKGIDISA PRGTPIYAAG AGKWYVGNQ 18 0 

LRGYGNLIMI KHSEDYITAY AHNDTMLVNN GQSVKAGQKI ATMGS TDAAS VRLHFQIRYR 240 

30 ATAIDPLRYL PPQGSKPKC 2 59 

<212> Type : PRT 
<211> Length : 259 

SequenceName : SEQ ID 199 
SequenceDescription : 



Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

40 MAQVINTNSL SLITQNNINK NQSALSSSIE RLSSGLRINS AKDDAAGQAI ANRFTSNIKG 60 

LTQAARNAND GISVAQTTEG ALSEINNNLQ RIRELTVQAS TGTNSDSDLD SIQDEIKSRL 12 0 

DEIDRVSGQT QFNGVNVLAK DGSMKIQVGA NDGQTITIDL KKIDSDTLGL NGFNVNGGGA 18 0 

VANTAAS KAD LVAANATWG NKYTVSAGYD AAKASDLLAG VSDGDTVQAT INNGFGTAAS 24 0 

ATNYKYD S AS KSYSFDTTTA SAADVQKYLT PGV GDTAKGT ITIDGSAQDV QISSDGKITA 3 00 

45 SNGDKLYIDT TGRLTKNGSG ASLTEASLST LAANNTKATT IDIGGTSISF TGNSTTPDTI 3 60 

TYSVTGAKVD QAAFDKAVST SGNNVDFTTA GYSVNGTTGA VTKGVDSVYV DNNEALTTSD 42 0 

TVDFYLQDDG SVTNGSGKAV YKDADGKLTT DAETKAATTA DPLKALDEAI SSIDKFRSSL 48 0 

GAVQNRLDSA VTNLNNTTTN LSEAQSRIQD ADYATEVSNM SKAQIIQQAG NSVLAKANQV 540 

PQQVLSLLQG 55 0 

50 <212> Type : PRT 

<211> Length : 550 

SequenceName : SEQ ID 20 0 
SequenceDescription : 

55 Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MKKI ACL SAL AAVLAFTAGT SVAATSTVTG GYAQSDAQGQ MNKMGGFNLK YRYEEDNS PL 60 

60 GVIGSFTYTE KSRTASSGDY NKNQYYGITA GPAYRINDWA SIYGWGVGY GKFQTTEYPT 12 0 

YKHDTSDYGF SYGAGLQFNP MENVALDFSY EQSRIRSVDV GTWIAGVGYR F 171 

<212> Type : PRT 
<211> Length : 171 
65 SequenceName : SEQ ID 201 

SequenceDescription : 
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Sequence 

<213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 
5 MKRNIIGGAF TLASLMLAGH ALAEDGWNF VGEIVDTTCE VTSDTADQIV PLGKVSKNAF 60 
SGVGSLASPQ KFSIKLENCP ATYTQAAVRF DGTEAPGGDG DLKVGTPLTA GNPGDFTGTG 120 
QA I AATGVGI RIFNQSDNSQ VKLYNDSAYT AIDAEGKAEM KFIARYVATN ATVTAGTANA 18 0 

DSQFTVEYKK 19° 
<212> Type : PRT 
10 <211> Length : 190 

SequenceName : SEQ ID 202 

SequenceDe script ion : 



- - Sequence 

15 . 

<213> OrganismName : Shigella flexneri 2a str. 2457T 
<4 00> PreSequenceString : 

MKKSTLALW MGIVASASVQ AAEIYNKDGN KLDVYGKVKA MKYMSDNASK DGDQSYIRFG 6 0 

FKGETQINDQ LTGYGRWEAE FAGNKAESDT AQQKTRLAFA GLKYKDLGSF DYGRNLGALY 12 0 

20 DVEAWTDMFP EFGGDSSAQT DNFMTKRAS G LATYRNTDFF GVIDGLNLTL QYQGKNENRD 180 

VKKQNGDGFG TSLTYDFGGS DFAISGAYTN SDRTNEQNLQ SRGTGKRAEA WATGLKYDAN 240 

NIYLATFYSE TRKMTPITGG FANKTQNFEA VAQYQFDFGIi RPSLGYVLSK GKDIEGIGDE 3 00 

DLVNYIDVGA TYYFNKNMSA FVDYKINQLD SDNKLNINND DTVAVGMTYQ F 3 51 



25 <212> Type : PRT 

<211> Length : 351 

SequenceName : SEQ ID 2 03 
SequenceDe script ion : 



30 Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString z 

MRKQWLGICI AAGMLAACTS DDGQQQTVSV PQPAVCNGPI VEISGADPRF EPLNATANQD 60 
35 YQRDGKSYKI VQDPSRFSQA GLAAIYDAEP GSNLTASGEA FDPTKLTAAH PTLPIPSYAR 12 0 

ITNLANGRMI WRINDRGPY GNDRVISLSR AAADRLNTSN NTKVRIDPII VAQDGSLSGP 180 
GMACTTVAKQ TYALPAPPDL SGGAGTSSVS GPQGDILPVS NSTLKSEDPT GAPVTSSGFL 24 0 

GAPTTLAPGV LEGSEPTPAP QPWTASSTT PATSPAMVTP QA_ASQSASGN FMVQVGAVSD 3 00 

QARAQQYQQQ LGQKFGVPGR VTQNGAVWRI QLGPFASKAE AS TLQQRLQT EAQLQSFITT 3 60 

40 AQ 3 62 

<212> Type : PRT 
<211> Length : 362 

SequenceName : SEQ ID 204 
SequenceDescription : 

45 

Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 
50 MKKKTIYQCV ILFFSLLNIH VGMAGP EQVS MHIYGNWDQ GCDVATKSAL QNIHIGDFNI 60 
SDFQAANTVS TAADLNIDIT GCAAGITGAD VLFSGEADTL A3? TLLKLTDT GGSGGMATGI 120 
AVQILDAQSQ QEIPLNQVQP LTPLKAGDNT LKYQLRYKST KA.GATGGNAT AVLYFDLVYQ 18 0 

* 

<212> Type : PRT 
55 <211> Length : 180 

SequenceName : SEQ ID 2 05 
SequenceDescription : 



Sequence 
60 

<213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MKNKLLFMML TILGAPGIAA AAGYDLANSE YNFAVNELSK SSFNQAAIIG QAGTNNSAQL 60 
RQGGSKLLAV VAQEGS SNRA KIDQTGDYNL AYIDQAGSAN DASISQGAYG NTAMIIQKGS 12 0 

65 GNKANITQYG TQKTAVWQR QSQMAIRVTQ R 151 
<212> Type : PRT 
<211> Length : 151 
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SequenceName : SEQ ID 206 
SequenceDescription : 

Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
<4 00> PreSequenceString : 

MMKFKKCLLP VAMLASFTLA GCQSNADDHA ADVYQTDQLN TKQETKTVNI ISILPAKVAV 60 
DNSQNKRNAQ AFGALIGAVA GGVIGHNVGS GSNSGTTAGA VGGGAVGAAA GSMWDKTLV 120 
EGVSLTYKEG TKVYTSTQEG KECQFTTGLA WITTTYNET RIQPNTKCPE KS 172 

<212> Type : PRT 
<211> Length : 172 

SequenceName : SEQ ID 207 

SequenceDescription : 

Sequence 



<213> OrganismName : Shigella flexneri 2a str . 2457T 
<400> PreSequenceString : 

MQTKKNEIWV GIFLLAALLA ALFVCLKAAN VTSIRTESTY TLYATFDNIG GLKARSPVSI 60 
GGVWGRVAD I TLDPKT YLP RVTLEIEQRY NHIPDTSSLS IRTSGLLGEQ YLALNVGFED 120 
PELGTAILKD GDTIQDTKSA MVLEDLIGQF LYGSKGDDNK MSGDAPAAAP GNNETTEPVG 180 
TTK 183 
<212> Type : PRT 
<211> Length : 183 

SequenceName : SEQ ID 208 

SequenceDescription : 

Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
<40G> PreSequenceString - 

MAPLAFSAQS LAESLTVEQR LELLEKALRE TQSELKKYKD EEKKKYTPAT VNRSVSTNDQ 
GYAANPFPTS SAAKPDAVLV KNEEKNASET GS I YSSMTLK DFSKFVKDEI GFSYNGYYRS 
GWGTASHGSP KSWAIGSLGR FGNEYSGWFD LQLKQRVYNE NGKRVDAWM IDGNVGQQYS 
TGWFGDNAGG ENFMQFSDMY VTTKGFLPFA PEADFWVGKH GAPKIEIQML DWKTQRTDAA 
AGVGLENWKV GPGKIDIALV REDVDDYDRS LQNKQQINTH TIDLRYKDIP LWDKATLMVS 
GRYVTANESA SEKDNQDNNG YYDWKDTWMF GTSLTQKFDK GGFNEFSFLV ANNS I ARNFG 
RYAGASPFTT FNGRYYGDHT GGTAVRLTSQ GEAYIGDHFI VANAI VYS FG NNIYSYETGA 
HSDFESIRAV VRPAYIWDQY NQTGVELGYF TQQNKDANSN KFNESGYKTT LFHTFKVNTS 
MLTSRLEIRF YATYIKALEN ELDGFTFEDN KDAQFAVGAQ AEIWW 
<212> Type : PRT 
<211> Length : 525 

SequenceName : SEQ ID 209 

SequenceDescription : 

Sequence 



<213> OrganismName : Streptococcus mutans UA159 
<400> PreSequenceString : 
MKKRILSAVL VSGVTLSSAT TLSAVKADDF DAQIASQDSK INNLTAQQQA AQAQVNTIQG 60 
QVSALQTQQA ELQAENQRLE AQSATLGQQI QTLSSKIVAR NESLKQQARS AQKSNAATSY 120 
INAIINSKSV SDAINRVSAI REWSANEKM LQQQEQDKAA VEQKQQENQA AINTVAANQE 180 
TIAQNTNALN TQQAQLEAAQ LNLQAELTTA QDQKATLVAQ KAAAEEAARQ AAAAQAAAEA 240 
KAAAEAKALQ EQAAQAQAAA NNNTQATDVS DQQAAAADNT QAAQTGDSTE QSAAQAVNNS 3 00 

DQESTTATEA QPSASSASTA AVAANTS SAN TYPAGQCTWG VKSLAPWVGN YWGNGGQWAA 3 60 

SAAAAGYRVG STPSAGAVAV WNDGGYGHVA YVTGVQGGQI QVQEANYAGN QSIGNYRGWF 420 
NPGSVSYIYP N 431 
<212> Type : PRT 
<211> Length : 431 

SequenceName : SEQ ID 210 

SequenceDescription : 

Sequence 



60 
120 
180 
240 
300 
360 
420 
480 
525 



<213> OrganismName : Streptococcus mutans UAX59 
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<400> PreSequenceString : 

MKVKKTYGFR KSKISKTLCG AVLGTVAAVS VAGQRVFADE TTTTSDVDTK WGTQTGNPA 60 

TNLPEAQGSA SKEAEQSQNQ AGETNGSIPV EVPKTDLDQA AKDAKSAGVN WQDADVNKG 12 0 

TVKTAEEAVQ KETEIKEDYT KQAEDIKKTT DQYKSDVAAH EAEVAKIKAK NQATKEQYEK 18 0 

5 DMAAHKAEVE RIMAANAASK TAYEAKLAQY QADLAAVQKT NAANQAAYQK ALAAYQAELK 240 

RVQEANAAAK AAYDTAVAAN NAKNTE I AAA NEEIRKRNAT AKAEYETKLA QYQAELKRVQ 3 00 

EAMAANEADY QAKLTAYQTE LARVQKANAD AKAAYEAAVA ANNAKNAALT AENTAIKQRM 3 60 

ENAKATYEAA LKQYEADLAA VKKANAANEA DYQAKLTAYQ TELARVQKAN ADAKAAYEAA 42 0 

VAANNAANAA LTAENTAIKK RNADAKADYE AKLAKYQADL AKYQKDLADY PVKLKAYEDE 48 0 

10 QASIKAALAE LEKHKNEDGN LTEPSAQNLV YDLEPNANLS LTTDGKFLKA SAVDDAFSKS 540 

TSKAKYDQKI LQLDDLDITN LEQSNDVASS MELYGNFGDK AGWSTTVSNIsT SQVKWGSVLL 600 

ERGQSATATY TNLQNSYYNG KKISKIVYKY TVDPKSKFQG QKVWLGIFTD PTLGVFASAY 660 

TGQVEKNTSI FIKNEFTFYD EDGKPINFDN ALLSVASLNR ENNSIEMAKD YTGKFVKISG 72 0 

SSIGEKNGMI YATDTLMFRQ GQGGARWTMY TRASEPGSGW DSSDAPNSWY GAGAIRMSGP 78 0 

15 NNSVTLGAIS STLWPADPT MAI ETGKKPN IWYSLNGKIR AVNVPKVTKE KPTPPVKPTA 84 0 

PTKPTYETEK PLKPAPVAPN YEKEPTPPTR TPDQAEPNKP TP P T YE TE KP LEPAPVEPSY 90 0 

EAEPTPPTRT PDQAEPNKPT PPTYETEKPL EPAPVEPSYE AEPTPPTPTP DQPEPNKPVE 960 

PTYEVIPTPP TDPVYQDLPT PPSVPTVHFH YFKLAVQPQV NKEIRNNNDI NIDRTLVAKQ 1020 

SWKFQLKTA DLPAGRDETT SFVLVDPLPS GYQFNPEATK AAS PGFDVTY DMATNTVTFK 1080 

20 ATAATLATFN ADLTKSVATI YPTWGQVLN DGATYKNNFT LTVNDAYGIK SNWRVTTPG 1140 

KPNDPDNPNN NYIKPTKVNK NENGWIDGK TVLAGSTNYY ELTWDLDQYK NDRSSADTIQ 12 00 

KGFYYVDDYP EEALELRQDL VKITDANGME VTGVSVDNYT NLEAAPQEIR DVLSKAGIRP 1260 

KGAFQIFRAD NPREFYDTYV KTGIDLKIVS PMWKKQMGQ TGGSYENQAY QIDFGNGYAS 13 20 

NIIINNVPKI NPKKDVTLTL DPADTNNVDG QTIPLNTVFN YRLIGGIIPA DHSEELFEYN 13 8 0 

25 FYDDYDQTGD HYTGQYKVFA KVDITFKDGS I IKS GAEL TQ YTTAEVDTAK GAITIKFKEA 1440 

FLRSVSIDSA FQAESYIQMK RIAVGTFENT YINTVNGVTY SSNTVKTTTP EDPTDPTDPQ 15 0 0 

DPSSPRTSTV INYKPQSTAY QPSSVQETLP NTGVTNNAYM PLLGIIGLVT SFSLLGLKAK 1560 

KD 1562 
<212> Type : PRT 

30 <211> Length : 1562 

SequenceName : SEQ ID 211 
SequenceDescription : 



Sequence 

35 

<213> Organ! srnName : Streptococcus mutans UA159 
<400> PreSequenceString : 

MLTELKAVLK KPMLWITMVG VALVPALYNI IFLSSMWDPY GKVSDLPVAV VNKDKTATYE 60 

GKKMTIGKDM TDNMVRNKSL DYHFVDSEKA QKGLEKGDYY MI ITLPEDLS QNAASVLTDE 12 0 

40 PKKLTIPYQT SKGHSFVASK MS ETAAKTL K ESVSKNITSS YTKSLFKNMS TLKTGLGSAA 18 0 

NASQKIATGS KQLANGSQVM TDNLNLLSNS SQSFAQGTNT LYSGLTAYTG GVGQLSAGLN 24 0 

NLNNGLTAYT NGVGQLANGS SQLSNQSQKL LGGVAQLAJSTG SASIQQLVNA SSQLNQGLIK 3 00 

LSTATGLSEE QVQQFSSLIN QLGTLNQSIQ NYSDNGTATT ANS PDLSTYL SAITTAAQAI 3 60 

VNSGNTSQQT TTNQSNALAA VQATGAYQRL SAEDQSEIAA ALANTGSSTT TTGADANAVS 420 

45 QAQAILNNVQ SIQSALSTLQ TTTANTPTSP SASLTQIKNT ANSVLPSAAT SLTTLS SGLT 4 80 

QAKTALDSQV VPVSTALANG TAQLGSTFST GANS LMTGVG QYTNAVDILN AGANTLAAKN 540 

NQLTDGTSQL VNGANQLNSN SGQLTKGTAQ LANGANQIET GAGKLAAGGE SLTAGLTTLS 60 0 

SGSGELSKAL STAKNKLSLV AVDNDNAKTL SSPVTIKHTD KDNVKTNGVG MAP YMMS AAL 660 

MVMAISTNTI FRVALSGKQA KTLREWIDQK LAVNGLIAVT GAIILYFGVH IIGLSANFEL 72 0 

50 KTLGLIILTS ITFMVLVTTL VTWHDKFGSF AAL ILLLLQL GSSAGTYPLA VTDKFFQWN 780 

PYLPMSYSVS GLRETISMAG TIGMQLLALS LFFLTFAAXiG LLIARRRIRS VKVA 834 

<212> Type : PRT 
<211> Length : 834 
55 SequenceName : SEQ ID 212 

SequenceDescription : 

Sequence 



60 <213> Organ! smName : Streptococcus mutans UA159 
<4 00> PreSequenceString : 

MVSQKNKSKK GQSKTFTLIS NRINLLFFLI VALFTVLLLR LAQMQLYDAK FYKSKLTEST 60 

TYTIKTSSPR GQ I YDAKGVA LVENEVKEW AFTRSNTMTA KDI KANAKKL ADMVTLTESK 120 

VTKRQKKDYY LADPKNYQKI VKKLPNNKKY DNFGNNLTES KIYANAVKAV PNSAIDYSED 180 

65 EKKIIHIFSQ MNATSVFNTA SLTTGDLTAE QIAVLATSKS DLKGISVKTD WERKTDKNSI 24 0 

TSIIGKVSSQ KTGLPAE EAN NYVKKGYSLN DRVGTSYLEK QYENDLQGSR TVQAIKVNKE 3 00 

GKIISDKTTA KGTKGKNLKL TLDLEFQKGV EQILNQYFNS ELASGNTKYS EGVYAWLNP 3 60 
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NTGAVLSMAG LEHDLKTGEV SSNALGAVTE VFTPGSWKG ATLTAGWENG VXiSGNQVLND 420 

QPIQFAGSSP INSWFTNGST PLTASQSLEY SSNTYMVQLA LKLMGQDYHS GMTLSTDGYK 48 0 

EAMEKLRATY AQYGLGVSTG IDLPGESKGY TPEHYDPSNV LTE S FGQFDN YTAMQLAQYA 540 

AAVANGGKRI APHLVEGIYD NNKTGGLGNL VQSIDTKVLN NVSISSDDMG IIKEGFYNW 6 00 

5 NGGSYATGKT LAKGASVPIS AKTGTAEAYV TGDDGKSVYT SNLNWAYAP SSNPQIAVAV 660 

VLPHETDLHG TTSHAITRDI INLYQKMYPM NQ 692 
<212> Type : PRT 
<211> Length : 692 

SequenceName : SEQ ID 213 



10 SequenceDe script ion : 

Sequence 



<213> OrganismName : Streptococcus mutans UA159 

15 <400> PreSequenceString : 

MTVLKYGLGI LLSAIILAII IGGLLFTYYV SSTPKLSEAK LKATNSSLVY DSNNNLIADL 60 

GAEKRESISS DSIPMKLVNA VTSIEDHRFF KHRGVDIYRI IGAAWSNLLH ECSTQGGSTLD 12 0 

QQLIKLAYFS TKESDQTLKR KAQEVWLSLQ MEKKYTKEEI LTFYVNKVYM GNGNYGMRTA 18 0 

AKSYYGKDLK DLSIAQLATL AGIPQAPTQY DPYAQPKAAT SRRNTVLSQM Y'KHKKI TKRE 240 

20 YD AAV ATP IS DGLQELKRSS SYPKYMDNYL KQVISEVKKR TGQDIFSAGM KVYTNVNADA 3 00 

QQYLWNIYNT DEYIAYPDDN FQVASTVMDV TNGKVIAQLG GRHQDTNVSF GTNQAVLTDR 3 60 

DWGSTMKPIS AYGPALESEA FTTTAQMLND SVYYYPGTTT QVYDWDHRYN GWMTIQTAIQ 42 0 

QSRNVPAVRA IDAAGLDTAK GFLSGLGIDY PEMRYSNAIS SNTSSSEQECY GAS S EKMAAA 48 0 

YAAFSNGGTY YEPQYVNKIE FKDGTSETYD AKGNRAMKET TAYMMTDMLK T7VLTYGTGTE 540 

25 AAI PGLYQAG KTGTSNYDDN ELVEMSEKLG INPYGLGTIA PDENFVGYTP QYSMAVWTGY 600 

KNRLMPVYGD SMKIAAQVYR TMMAYLSSSG NSDWTMPDGL YRSGGYLYLN GSSGSNSRYG 660 

AAPATSSSSS SSSSSDSNNN DQNNNQTTEA SSDSSSSSSD ATTSSNP 707 
<212> Type : PRT 
<211> Length : 707 

30 SequenceName : SEQ ID 214 

SequenceDescription : 



Sequence 



35 <213> OrganismName ; Streptococcus mutans UA159 
<400> PreSequenceString : 

MKSKTAKITL LSSLALAAFG ATNVFADEAS TQLNSDTVAA PTADTQASEP AATEKEQSPV 60 

VAWESHTQG NTTTTTSQVT SKELEDAKAN ANQEGLEVTE TEAQKQPSVE AADADNKAQA 120 

QTINTAVADY QKAKAEFPQK QEQYNKDFEK YQSDVKEYEA QKAAYEQYKK EVAQGLASGR 180 

40 VEKAQGLVFI NEPEAKLSIE GVNQYLTKEA RQKHATED I L QQYNTDNYTA SDFTQANPYD 240 

PKEDTWFKMK VGDQISVTYD NIVNSKYNDK KISKVKINYT LNSSTNNEGS JVLVNLFHDPT 3 00 

KTIFIGAQTS NAGRNDKISV TMQI IFYDEN GNEIDLSGNN AIMSLSSLNH WTTKYGDHVE 360 

KVNLGDNEFV KIPGSSVDLH GNEIYSAKDN QYKANGATFN GDGAD GWDAV ISJAD GT P RAAT 42 0 

AYYGAGAMTY KGEPFTFTVG GNDQNLPTTI WFATNSAVAV PKDPGAKPTP PEKPELKKPT 48 0 

45 VTWHKNLWE TKTEEVPPVT PPTTPDEPTP EKPKTPEDPQ SPWAKSVSF R.TARKGEMRV 540 

RERDYQPTLP HAGAAKQNGL ATLGAISTAF AAATL I AARK KENT 583 
<212> Type : PRT 
<211> Length : 583 

SequenceName : SEQ ID 215 



50 SequenceDescription : 

Sequence 



<213> OrganismName : Streptococcus mutans UA159 

55 <400> PreSequenceString : 

MEQKIFSKRK SKIAGLCGAI LTTTWALAS GTVIEADETI EQPVAAETVS QADGDNPEQT 60 
TSVQQETAPQ QTKTSQSSDA TVDSEESATS PSDEQTVSQN DSNSSSQIDQ TIADTNRSDS 120 
DHISKTSAAT TEDQEEKVNS AKAQTAAATN NQDTRYSAKD AYGNSNFNKT LTEFGKNANV 180 
ADVTYNGVRD EYIWNDPSA PYVPNANEIA KYLKEYLTEL RNINNIAIPV E>SVDQVMQKY 240 

60 AQDRANEEAN EKNGLDHDTN LPIPNNLTWV AEDGHLDMDS SIQSKSQEGY TLASDKATAY 3 00 

YLALNWFSDY FNI YDDPNDG LKSFGHAVS I LSDGGTGMGL GLASGQDNEK GMWYAQLEFG 3 60 

GNDNEDNTND FSSLKNGKGE WVLYYKGSPV KFLPNTTFWY VKKGTSPDAA STPHNSDKPS 420 
FQSSKDLDPN FKADNRFQEG KEASVHQAIP ATFKSHRDEV GNKDQNSLSA QLPDTGVQKN 480 
NQLALIALGT GLILLSGLLL SKRKSLK 507 

65 <212> Type : PRT 

<211> Length : 507 

SequenceName : SEQ ID 216 
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SequenceDe script ion : 
Sequence 



5 <213> OrganismName : Streptococcus mutans UAX59 

<4 00> PreSequenceString : 

MTFEKQKHFS LRKLKFGLVS VAIIAFLFAV 

TSNQVEAKTD SANKDPQEKT GSVATDAPSM 

TDLPQNSFKQ QSAHVKMTTE AEKTPSHSIN 
10 MYFAQDGKQV KGAFAQDSDG NKHYYDRDSG 

NGQSLYFNSD GSQVKGNFVE EDGSLRYYDK 

KI EWKTS L V VDSYEFGPSV SKIILEFNHK 

GHWYFDSSH YVTLELDIPY DPNDSSRNAS 

SNSSQIISSE QDAINNRFLP TTDRFSER0S 
15 EVGTDINIPIi LASNVARLTE DPIQSHFTST 

IKAYVASHPD IDSRRIYLAG VSNGGGMTLD 

AALKALKGQP MWLIHTRTDK TISADSSVLP 

YNGHWSWIYF LNDQVTGTQN TDNAKNWSGL 

NGQRRR 
20 <212> Type : PRT 

<211> Length : 726 

SequenceName : SEQ ID 217 
SequenceDescription : 

25 Sequence 



<213> OrganismName : Streptococcus mutans UAX59 

<400> PreSequenceString : 

MKIFIKKHQQ SILYYSLSFL LPSFIMFLVL 
30 NILHGTDSLF YSFKAGLGFN IFALTSYYLG 

GLSAFYSLGQ IYTKISKSLV LMLSTSYALM 

EKRGIFLYFL TLTCLFIQNY YFGFMTAIFL 

LTSAFMLLPT FLDLKSHGEV LTEQISLFSS 

LLPLIFAITF FFVKSIKWQV KVAYFLLLAI 
35 FSLVIVIMAA ETLTRIKDIK LKNFYPAFTF 

VSYFIILFTF FNQLVSYKVI ISFTLIFTSF 

EIDNYVKKTK KDNLEFFRTE KQIPQTYNDG 

QGNHSTISYP NNTILMDSLF SIKYNINNQN 

NHIYKDVKFD SYPLDNQQKF VNELTDLNLT 
40 QVYYTVKCPA NSQLYISLPN LTVNNKDENV 

LIFKLSFPKN KTVSYDLPHI YALDLTAYQK 

LIYTLPYDKG WFAKQNGKAI KISKAQNGLM 

IFLFVFYQLY YKKFNIiK 

<212> Type : PRT 
45 <211> Length : 857 

SequenceName : SEQ ID 218 
SequenceDescription : 

Sequence 
50 

<2 13 > OrganismName : Streptococcus mutans UAX59 
<400> PreSequenceString : 

MKLKHILRIG AVAFAS ILLL TACGSKTSKK TVTLATVGTT NPFSYEKKGK LTGYDIEVAK 60 
EVFKASDKYD VKYQKTEWTS IFSGLDSDKY QIGANNISYT KERANKYLYS NPTASNPLVL 120 

55 WPKDSDIKS YNDIAGHSTQ WQGNTTVSM LQKFNKNHEN ETQVKLNFTSE DLAHQIRNVS 180 
DGKYDFKIFE KISAETIIKE QGLDNLKVID LPSDQKPYVY FIFAQDQKDL QKFVNKRLKK 24 0 

LYENGTLEKL SKKYLGGSYL PDKKDMK 2 67 

<212> Type : PRT 
<211> Length : 267 

60 SequenceName : SEQ ID 219 

SequenceDescription : 

Sequence 



65 <213> OrganismName : Streptococcus mutans UAX59 
<40 0> PreSequenceString : 

MRFLVFL I AF FAAFYKFIET ERIDSNTVAV NPDSLILKRF XjKTNQLNGIM IVTGPDGKAQ 60 



TKTAEADETV XTEQRQTSKI NASSQKVENQ 60 

NSANNMSQSD KQNTVNEISS DSQQTKTDEQ 120 

TFVNDGNGNW YYLGADGRNV TGSHTIGGKT 180 

EMWTNRFVND QGNWYYLNND GVPVTGSITV 240 

NSGDLLRKTS RTINGVNYQF DNDGNARAID 3 00 

VTPAWHAGA MVTTAGVQRK ILNSYVSNAS 3 60 

PFIFDSAAFR NNWVNSYTVK VDNLQVQADG 420 

YGNFNYAAYQ PEAAIGGEKN -PLIVWLHGIG 480 

GSGGQKGAYV XiVPQSSIPWS QNQTASLMAL 540 

MGVAYPNYFA ALVPIAASYS NQLTDNQITA 6 00 

FYKELLQAGA QNKWLSYYET NVGKHHSGVT 660 

SGMVATNPTY GGDAKATVNG RTYSNVFDWL 720 

726 



FSKNIYWGSS TTILASDGFH 
SFLTPFTYFF 2SFVKNMADAFY 
SFTSSQLELN NWLDVFILLP 
TLWFFTQVSW DIRNRMKRLS 
DIWYFDFFAK SLLGSYDTTK 
IIASFIFQPL DLFWQGMHSP 
LGVGLLATFL FKDYYNYLTQ 
EIALNTFYQX EGIQTDWNFP 
MKFNYNSISQ FSSVKNNLSA 
PHKFGFHLKQ KNNKLQLYKN 
LFKEIPIISS VGMQVLDNRV 
FITTNKHTSS YIIDESYYLF 
SIKQLKSQTV KTTTKKNKIF 
KIDVSKGSGK I IMTFVPQGL 



QYVI FDALFR 60 

LFTLIKFGLI 120 

LIMLGLQRLV 18 0 

DFVLVSIFAT 240 

YGSXPTIYIG 3 00 

NMFLHRYSWA 3 60 

VNFILTTIFL 420 

S RE VYEDNVK 480 

QLLNSLGYYS 540 

FYSLPLALMS 600 

TINGS KGNKA 660 

NLGNYKKTQT 720 

TTYVAKKRTS 780 

YQGILLTCLG 840 

857 
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VFSNQSKVDG SPVSIKDYFP LASLQKLITG VAIQQLIDKG KLSLNTPLSK YYPQIENSEN 120 

ITIQNLLTHT SGLADRKEVP QQVLTTQEQQ LDFSLTNYRV TYRKKWKYAN INYALLAG-II 180 

SQISGQNYAT YVRQHFLTAG KGWHFKKYIQ IKDKSKLAAL SVMDQS TTWD KLSKEVTSTF 240 

GAGDYASRPV DYWKFMMAFI NDQFVPVSEY QRSMKMTSKS YYGGLYISQK MLHANGGGFD 3 00 

TYSCFAYSNP KTKQVMVLFI TNGKYKRVKS LAAKAFKLYA DSYALRKNET SK 3 52 



<212> Type : PRT 

<211> Length : 352 

SequenceName : SEQ ID 220 
SequenceDescription : 

Sequence 



<213> OrganismName : Streptococcus mutans UA159 
<4 00> PreSequenceString : 

MKKKIALAAL SFVSAAVLAA CSSAPGGSSD AAGNKI GDTV KIGYNLELSG DVAAYGQAEK 
NGANLAVEEI NKAGGIDGKK IKVISKDNKS DNGEASTIST NLATQSKVNA ILGPATSGAT 
AAAAPNANDA AVPLVTPSGT QDNLTYSKGK VQDYIFRTTF QDSFQGKIIA KYATDNLKAK 
KVALYYDKSS DYAQGIADAF KKAYKGKI TV EDTFQAKDQD FQAALTKFKN KDFDAIVXPG 
YYTETGLITK QARDMGLTQP I LGPDGFNDE KYVEGAGAAN TNNVHYVSGY STKVALTNKA 
EKFLKDYKAK YGEEPNMFAA LA YD S V YM I A DAAKDAKTSK DIATNLAKLK NFKGVTGKMT 
IDKKHNPVKS AVMVGLKDGK EDTATAVEAK 
<212> Type : PRT 
<211> Length : 390 

SequenceName : SEQ ID 221 

SequenceDescription : 

Sequence 



<213> OrganismName : Streptococcus mutans UA159 
<400> PreSequenceString : 
MKKLSLLLLV CLSLLGLFAC TSKKTADKKL TWATNS 1 1 A DITKNIAGNK WLHSIVX>VG 60 
RDPHEYEPLP EDVKKTSQAD VIFYNGINLE NGGNAWFTKL VKNAHKKTDK DYFAVSDSVK 120. 
TIYLENAKEK GKEDPHAWLD LKNGIIYAKN IMKRLSEKDP KNKSYYQKNF QAYSAKLEKL 180 
HKVAKEKISR IPTEKKMIVT SEGCFKYFSK AYDIPSAYIW EINTEEEGTP NQ I KAL VTCKL 240 
RKSRVSALFV ESSVDDRPMK TVSKDTGIPI AAKIFTDSVA KKGQAGDSYY AMMKWNLDKI 3 00 

ANGLSQ 3 06 

<212> Type : PRT 
<211> Length : 3 06 

SequenceName t SEQ ID 222 

SequenceDescription : 

Sequence 



<213> OrganismName : Streptococcus mutans UA159 
<400> PreSequenceString : 
MFVHTKTKKK RKWQRKVFLL LLLFLLPIVS VLAFIVLFIG GGTAESHDVE ATTGGVKIiSA 60 
KQFADKTKLG I SEEEAKNAL AFADRLMSRH HFTAQATAGV LAVGFRESGF DVKAVNUSGG 120 
VAGFFQWSGW GSSVNGDRWK VASKRELTLE VEVDLMSTEL DGRYADWKK VGSATDEKQA 18 0 

AKDWSQYYEG VAVSDGQTKA DKIESWATTI CEALKS GGTN YAKVNNTGTS STAIPQGWEN 240 
ISAFDGHAYE GSENYPQGQC TWYVYNRAKQ LGVSFSPYMG NGGQWYQVQG YHSSHTPKAH 3 00 

TALSFVNGQA GSDPTYGHVA FVEAVKDDGS ILISEMNVYG QPAMTVAYRT FDAETAKQFW 360 
YVEGK 3 65 

<212> Type : PRT 
<211> Length : 365 

SequenceName : SEQ ID 223 

SequenceDescription : 

Sequence 



<213> OrganismName : Streptococcus mutans UA159 
<400> PreSequenceString : 

MKMKRKLLSL VSVLTILLGA FWVTKIVKAD QVTNYTNTAS ITKSDGTALS NDPSKAVTSTYW 60 

EPLSFSNSIT FPDEVSIKAG DTLTIKLPEQ LQFTTALTFD VMHTNGQLAG KATTDPKTTGE 120 

VTVTFTDIFE KLPNDKAMTL NFNAQLNHNN ISIPGWNFN YNNVAYSSYV KDKDITPISP 180 

DVNKVGYQDK SNPGLIHWKV LINNKQGAID NLTLTDWGE DQEIVKDSLV AARLQ Y IAGD 240 

DVDSLDEAAS RPYAEDFSKN VTYQTNDLGL TTGFTYTIPG SSNNAIFISY TTRLTSSQSA 3 00 



60 
120 
180 
240 
300 
360 
390 
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GKDVSNTIAI SGNNINYSNQ TGYAR I E SAY GRASSRVKRQ AETTTVTETT TSSSSETTTS 3 60 

EATTETSSTT NNNSTTTETA TSTTGASTTQ TKTTASQTNV PTTTNITTTS KQVTKQKAKF 420 
VLPSTGEQAG LLLTTVGLVI VAVAGVYFYR TRR 453 
<212> Type : PRT 
5 <211> Length : 453 

SequenceName : SEQ ID 224 

SequenceDescription : 

Sequence 
10 

<213> Organi smName : Streptococcus mutans UA159 
<400> PreSequenceString : 

MTFKKLVLGL LSFVAVFTLV ACSSSNSKNL QDDIKEKKKL WAVSPDYAP FEFKALWGK 60 
DTWGAD I DL AKATAKELGV KLELSSMSFD NVLSSLKTGK ADIAISGLSY TKERAQAYDF 120. 

15 SEAYYKTENA ILIKKSDLNK YTMISSFNNK TKVAVQKGTI EEGLAKNQLK QSNITSLTSM 18 0 

GEAWELKSG QVDAIDLEKP VAEGYVSQNS DLVLAKVALK TGEGDAKAVA LPKDSGQLVK 240 
TVNKVIKKLK KEDKYKQ F I S DAVKLTGQQV D 271 
<212> Type : PRT 
<211> Length : 271 

20 SequenceName : SEQ ID 225 

SequenceDescription : 

Sequence 



25 <213> OrganismName : Streptococcus mutans UA159 
<400> PreSequenceString : 

MKKHFFMTFS LLLAAVFLVA CSNLSDSGQR NWDKINKRGM LKIATAGTLY PQSYHDDHNK 60 

LTGYDVEILK EIGKRLGLKV QFTEMGVDGM LTAIKSGQID VANYSLEDGN KNISKFLRTS 12 0 

PYKYSFTSMV VRSKDDSGIH SWSDLKGKKA AGAAS TNYMK IAKKLGAKLV VYDNVTMDVY 18 0 

30 MKDLVNGRTD VI INDYYLQK IAVAAVKDKY AIKINQGLYA NPYSTS FTLS LKNKVLQKKI 24 0 

NKAVKDMRKD GTLTKLSKKF FQGEDVTKKH YNSYKKIDIS DVD 2 83 
<212> Type : PRT 
<211> Length : 283 

SequenceName : SEQ ID 226 

35 SequenceDescription : 

Sequence 



<213> OrganismName t Streptococcus pneumoniae R6 

40 <400> PreSequenceString : 

MKLLKKMMQV ALATFFFGLL GTSTVFADDS EGWQFVQENG RTYYKKGALK ETYWRVIDGK 60 

YYYFDPLSGE MWGWQYI PA PHKGVTIGPS PRIEIALRPD WFYFGQDGVL QEFVGKQVLE 12 0 

AKTATNTNKH HGEEYDSQAE KRVYYFEDQR SYHTLKTGWI YEEGYWYYLQ KDGGFDSRIN 180 

RLTVGELARG WVKDYPLTYD EEKLKAAPWY YLDPATGWQN LGNKWYYLRS SGAMATGWYQ 24 0 

45 EGSTWYYLNA SNGDMKTGWF QVNGNWYYAY DSGALAVNTT VGGYYLNYNG EWVK 294 



<212> Type : PRT 
<211> Length : 294 

SequenceName : SEQ ID 227 
50 SequenceDescription : 

Sequence 



<213> OrganismName : Streptococcus pneumoniae R6 

55 <4 00> PreSequenceString : 

MKLLKKMMQV LLAVFFFGLL ATNTVFANTT GGRFVD KDNR KYYVKDDHKA IYWHKIDGKT 60 
YYFGDIGEMV VGWQ YLE IPG TGYRDNLFDN QPVNEIGLQE KWYYFGQDGA LLEQTDKQVL 120 
EAKTSENTGK VYGEQYPLSA EKRTYYFDNN YAVKTGW I YE DGNWYYLNKL GNFGDDSYNP 18 0 

LPIGEVAKGW TQDFHVTIDI DRS KPAPWYY LDASGKMLTD WQKVNGKWYY FGSSGSMATG 240 

60 WKYVRGKWYY LDNKNGDMKT GWQYLGNKWY YLRSSGAMVT GWYQDGLTWY YLMAGNGDMK 3 00 

TGWFQVNGKW YYAYSSGALA VNTTVDGYSV NYNGEWVQ 3 38 

<212> Type : PRT 
<211> Length : 338 

SequenceName : SEQ ID 228 

65 SequenceDescription : 



Sequence 
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<213> OrganismName : Streptococcus pneumoniae R6 
<400> PreSequenceString : 

MNKKKMILTS LASVAILGAG FVASQPTWR AEESPVASQS KAEKDYDAAK KDAKNAKKAV 
5 EDAQKALDDA KAAQKKYDED QKKTEEKAAL EKAAS EEMDK AVAAVQQAYL AYQQATDKAA 
KDAADKMIDE AKKREEEAKT KFNTVRAMW PEPEQLAETK KKSEEAKQKA PELTKKLEEA 
KAKLEEAEKK ATEAKQKVDA EEVAPQAKIA ELENQVHRLE QELKEIDESE SEDYAKEGFR 
APLQSKLDAK KAKLSKLEEL SDKIDELDAE IAKLEDQLKA AEENNNVEDY FKEGLEKTIA 
AKKAELEKTE ADLKKAVNEP EKPAPAPETP APEAPAEQPK PAPAPQPAPA PKPEKPAEQP 
10 KPEKTDDQQA EEDYARRSEE EYNRLTQQQP PKAEKPAPAP KTGWKQENGM WYFYNTDGSM 
ATGWLQNNGS WYYLNSNGAM ATGWLQYNGS WYYLNANGAM ATGWAKVNGS WYYLNANGAM 
ATGWLQYNGS WYYLNANGAM ATGWAKVNGS WYYLNANGAM ATGWLQYNGS WYYLNANGAM 
ATGWAKVNGS WYYLNANGAM ATGWVKD GDT WYYLEASGAM KASQWFKVSD KWYYVNGLGA 

. LAVNTTVDGY KVNANGEWV ~ - - 

15 <212> Type : PRT 

<211> Length : 619 

SequenceName : SEQ ID 229 
SequenceDe script ion : 

20 Sequence 



<213> OrganismName : Streptococcus pneumoniae R6 

<400> PreSequenceString : 

MKILPFIARG TSYYLKMSVK KLVPFLWGL MLAAGDSVYA YSRGNGSIAR GDDYPAYYKN 60 
25 GSQEIDQWRM YSRQCTSFVA FRL SNVNGFE IPAAYGNANE WGHRARREGY RVDNTPTIGS 12 0 

ITWSTAGTYG HVAWVSNVMG DQIEIEEYNY GYTESYNKRV IKANTMTGFI HFKDLDSGSV 18 0 

GNSQSSASTG GTHYFKTKSA I KTEPLVS AT VIDYYYPGEK VHYDQILEKD GYKWLSYTAY 24 0 

NGSYRYVQLE AVNKNPLGNS VLSSTGGTHY FKIKSAIKTE PLVSATVIDY YYPGEKVHYD 3 00 

QILEKDGYKW LSYTAYNGSR RYIQLEGVTS SQNYQNQSGN ISSYGSNNSS TVGWKKINGS 3 60 

30 WYHFKSNGSK STGWLKDGSS W YYLKL S GEM QTGWLKENGS WYYLGS S GAM KTGWYQVSGE 420 

WYYSYSSGAL AINTTVDGYR VNSDGERV 448 

<212> Type : PRT 

<211> Length : 448 

SequenceName : SEQ ID 23 0 
35 SequenceDescription r 

Sequence 



<213> OrganismName : Streptococcus pneumoniae R6 

40 <400> PreSequenceString : 

MFASKSERKV HYSIRKFSIG VASVAVASLV MGSWHATEN EGSTQAATSS NMAKTEHRKA 60 

AKQWDEYIE KMLRE IQLDR RKHTQNVALN IKLSAIKTKY LRELNVLEEK SKDELPSEIK 12 0 

AKLDAAFEKF KKDTLKPGEK VAEAKKKVEE AKKKAEDQKE EDRRNY PTNT YKTLELEIAE 180 

FDVKVKEAEL ELVKEEAKES RNEGTI KQAK EKVESKKAEA TRLENI KTDR KKAEE EAKRK 240 

45 ADAKLKEANV ATSDQGKPKG RAKRGVPGEL ATPDKKENDA KSSDSSVGEE TLPSSSLKSG 3 00 

KKVAEAEKKV EEAEKKAKDQ KEEDRRNYPT NTYKTLDLE I AE SDVKVKEA ELELVKEEAK 3 60 

EPRDEEKIKQ AKAKVESKKA EATRLENIKT DRKKAEEEAK RKAAEEDKVK EKPAEQPQPA 42 0 

PATQPEKPAP KPEKPAEQPK AEKTDDQQAE EDYARRSEEE YNRLTQQQPP KTEKPAQPST 480 

PKTGWKQENG MWYFYNTDGS MATGWLQNNG SWYYLNANGA MATGWL QNNG SWYYLNANGS 540 

50 MATGWLQNNG SWYYLNANGA MATGWLQYNG SWYYLNSNGA MATGWLQYNG SWYYLNANGD 60 0 

MATGWLQNNG SWYYLNANGD MATGWLQYNG SWYYLNANGD MATGWVKDGD TWYYLEASGA 660 

MKASQWFKVS DKWYYVNGSG ALAVNTTVDG YGVNANGEWV N 7 01 
<212> Type : PRT 
<211> Length : 701 

55 SequenceName : SEQ ID 231 

SequenceDescription : 



Sequence 



60 <213> OrganismName : Streptococcus pneumoniae R6 
<400> PreSequenceString : 

MKKTTILSLT TAAVI LAAYV PNEPILAAYV PNEPILADTP SSEVIKETKV GSIIQQNNIK 60 

YKVLTVEGNI GTVQVGNGVT PVEFEAGQDG KPFTIPTKIT VGDKVFTVTE VASQAFSYYP 12 0 

DETGRIVYYP SSITIPSSIK KIQKKGFHGS KAKTIIFDKG SQLEKIEDRA FDFSELEEIE 180 

65 LPASLEYIGT SAFSFSQKLK KLTFSSSSKL ELISHEAFAN LSNLEKLTLP KSVKTLGSNL 240 

FRLTTSLKHV DVEEGNE S FA SVDGVLFSKD KTQLIYYPSQ KNDESYKTPK ETKELASYSF 300 

NKNSYLKKLE LNEGLEKIGT FAFADAIKLE EISLPNSLET I ERLAF YGNL ELKELILPDN 360 



60 
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VKNFGKHVMN GLPKFLTLSG NNINSLPSFF LSGVLDSLKE IHIKNKSTEF SVKKDTFAIP 42 O 

ETVKFYVTSE HIKDVLKSNL STSNDIIVEK VDMI KQETDV AKPKKNSNQG WGWVKDKGL 48 O 

WYYLNESGSM ATGWVKDKGL WYYLNESGSM ATGWVKDKGL WYYLNESGSM ATGWVKDKGL 54 O 

WYYLNESGSM ATGWVKDKGL WYYLNESGSM ATGWVKDKGL WYYLNESGSM ATGWVKDKGL 60 O 

5 WYYLNESGSM ATGWVKDKGL WYYLNESGSM ATGWVKDKGL WYYLNESGSM ATGWVKVSGK 66 O 

WYYTYNSGDL LVNTTTPDGY RVNANGEWVG 69 O 

<212> Type : PRT 
<21l> Length : 690 

SequenceName : SEQ ID 232 
10 SequenceDe script ion : 

Sequence 



40 



60 



<213> OrganismName Streptococcus pneumoniae R6 

15 <400> PreSequenceString : 

MEINVSKLRT DLPQVGVQPY RQVHAHS TGN PHS TVQNEAD YHWRKDPELG FFSHIVGNGC SO 
IMQVGPVDNG AWDVGGGWNA ETYAAVELIE SHSTKEEFMT DYRLYIELLR NLADEAGLPK 12 O 

TLDTGSLAGI KTHEYCTNNQ PNNHSDHVDP YPYLAKWGIS REQFKHDIEN GLTIETGWQK 18 O 

NDTGYWYVHS DGSYPKDKFE KINGTWYYFD SSGYMLADRW RKHTDGNWYW FDNS GEMATG 24 O 

20 WKKIADKWYY FNEEGAMKTG WVKYKDTWYY LDAKEGAMVS NAFIQSADGT GWYYL KPDGT 30 O 

LADRPEFTVE PDGLITVK 3 IS 

<212> Type : PRT 
<211> Length : 318 

SequenceName : SEQ ID 233 

25 SequenceDescription : 

Sequence 



<213> OrganismName : Neisseria meningitidis Z2491 
30 <40 0> PreSequenceString : 

MTFAYWCILI AYLLPLFCAA YAKKAGGFRF KDNHNPRDFL ARTQGTAARA HAAQQNGFEA 6 0 

FAPFAAAVLT AHATGNAGQA TVNTLAGLFI LFRLAFIWCY IADKAALRSL MWVGGFVCTV 12 O 

GLFWAA 12 V 

<212> Type : PRT 
35 <211> Length : 127 

SequenceName : SEQ ID 234 
SequenceDescription : 



Sequence 



<213> OrganismName : Neisseria meningitidis Z2491 
<400> PreSequenceString : 

MNKIYRIIWN SALNAWVAVS ELTRNHTKRA SATVKTAVLA TLLFATVQAN ATDEDEEEEL 6 0 

ESVQRSWGS IQASMEGSGE LETISLSMTN DSKEFVDPYI WTLKAGDNL KIKQNTNENT 12 O 

45 NASSFTYSLK KDLTGL I NVE TEKL S FGANG KKVNIISDTK GLNFAKETAG TNGDTTVHLN 18 O 

GIGSTLTDTL AGS S AS HVD A GNQSTHYTRA AS I KDVLNAG WNIKGVKTGS TTGQSENVDF 24 O 

VRTYDTVEFL SADTKTTTVN VESKDNGKRT EVKIGAKTSV I KEKDGKLVT GKGKGENGSS 30 O 

TDEGEGLVTA KEVI DAVNKA GWRMKTTTAN GQTGQADKFE TVTSGTNVTF ASGKGTTATV 2 6 O 

SKDDQGNITV MYDVNVGDAL NVNQLQNSGW NLDSKAVAGS SGKVISGNVS PSKGKMDETV 42 O 

50 NINAGNNIEI SRNGKNIDIA TSMAPQFSSV SLGAGADAPT LSVDDEGALN VGS KDANKPV 48 O 

RITNVAPGVK EGDVTNVAQL KGVAQNLNNR I DNVDGNARA GIAQAIATAG LVQAYLPGKS 54 O 

MMAIGGGTYR GEAGYAI GYS SISDGGNWII KGTASGNSRG HFGASASVGY QW 592 

<212> Type : PRT 
55 <211> Length : 592 

SequenceName : SEQ ID 235 
SequenceDescription : 



Sequence 



<213> OrganismName : Neisseria meningitidis Z2491 
<400> PreSequenceString : 

MLLAEGQKSA VTEYYLNHGT WPSNNSDAGV ASTATDIKGK YVKEVKVEKG VITATMLSSG 6 0 

VNNEIKGKKL SLWAKRQAGS VKWFCGQPVE RAANNAANDA VTAATANGNG KIDTKHLPST 12 O 

65 CRDAASAVCI ETPPTAFYKN T 141 
<212> Type : PRT 
<211> Length : 141 
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SequenceName : SEQ ID 23 6 
SequenceDescription : 

Sequence 
5 

<213> OrganismName : Neisseria meningitidis Z2491 
<4 00> PreSequenceString : 

MKTTDKRTTE THRKAPKTGR IRFSPAYLAI CLSFGILPQA WAGHTYFGIN YQYYRDFAEN 60 

KGKFAVGAKD IEVYNKKGEL VGKSMTKAPM IDFSWSRNG VAALVGDQYI VSVAHNGGYN 12 0 

10 NVDFGAEGRN PDQHRFSYQI VKRNNYKPDN SHPYNGDYHM PRLHKFVTDA EPVEMTSDMR 18 0 

GNTYSDKEKY PERVRIGSGH HYWRYDDDKH GDLSYSGAWL IGGNTHMQGW GNNGWSLSG 240 

DVRHANDYGP MP I AGAAGD S GSPMFIYDKT NNKWLLNGVL QTGYPYSGRE NGFQLIRKDW 300 

FYDDIYRGDT HTVFFEPRSN GHFSFTSNNN GTGTVTETNE KVSNPKLKVQ TVRLFDESLN 3 60 
ETDKEPVYAA GGVNQYRPRL NNGENLSFID YGNGKLILSN NINQGAGGLY FEGDF TVS PE ... -42 0 

15 NNETWQGAGV HISEDSTVTW KVNGVANDRL SKIGKGTLHV QAKGENQGSI SVGDGTVILD 480 

QQADDKGKKQ AFSEIGLVSG RGTVQLNADN QFNPDKLYFG FRGGRLDLNG HSLSFHRIQN 540 

TDEGAMIVNH NATTTSTVTI TGNESITQPS GKNINRLNYS KEIAYNGWFG EKDTTKTNGR 60 0 

LNLVYQPAAE DRTLLLSGGT NLNGNITQTN GKLFFSGRPT PHAYNHLGSG WSKMEGIPQG 660 

EIVWDNDWIN RTFKAENFHI QGGQAVISRN VAKVEGDWHL SNHAQAVFGV APHQSHTICT 72 0 

20 RSDWTGLTNC VEKTITDDKV IASLTKTDIS GNVS LADHAH LNLTGLATLN GNLSANGDTR 780 

. YTVSHNATQN GNLSLVGNAQ ATFNQATLNG NTSASGNASF NLSNNAAQNG SLTLSDNAKA 84 0 

NVSHSALNGN VSIiADKAVFH FENSRFTGQIi SGSKDTALHIj KDSEWTLPSG TELGNLNLDN 900 

AT I TLNS AYR HDAAGAQTGS VSDTPRRRSR RSLLSVTPPT SVESRFNTLT VNGKLNGQGT 960 

FRFMSELFGY RSDKLKLAES S EGT YTLAVN NTGNEPVSLD QLTWEGKDN KPLSENLNFT 1020 

25 LQNEHVDAGA WRYQLIRKDG EFRLHNPVKE QELSDKLGKA EAKKQAEKDN AQSLDALIAA 108 0 

GRDAAEKTES VAEPARQAGG ENVGIMQAEE EKKRVQADKD SALAKQREAE TRPATTAFPR 1140 

ARRARRDLPQ PQPQPQPQPQ PQRDLISRYA NSGLSEFSAT LNSVFAVQDE LDRVFAEDRR 1200 

NAVWTSGIRD TKHYRSQDFR AYRQQTDLRQ IGMQKNLGSG RVGILFSHNR TENTFDDGIG 1260 

MS ARLAHGAV FGQYGIGRFD IGISTGAGFS SGSLSDGIGG KIRRRVLHYG IQARYRAGFG 132 0 

30 GFGIEPYIGA TRYFVQKADY RYENVNIATP GLAFNRYRAG IKADYSFKPA QHISITPYLS 138 0 

LSYTDAASGK VRTRVNTAVL AQD FGKTRS A E WGVNAE I KG FTLSLHAAAA KGPQLEAQHS 1440 

AGIKLGYRW 1449 
<212> Type : PRT 
<211> Length : 1449 

35 SequenceName : SEQ ID 237 

SequenceDescription : 

Sequence 



40 <213> OrganismName : Neisseria meningitidis Z2491 
<40 0> PreSequenceString : 

MNTLQKGFTL IELMIVIAIV GILAAVALPA YQDYTARAQV SEAILLAEGQ KSAVTEYYLN 
HGEWPSNNTS AGVASSTDIK GKYVQSVEVK NGWTATMAS SNVNNEIKGK KLSLWAKRQD 
GSVKWFCGQP VKRNDTATTN DDVKADTAAN GKQIDTKHLP S TCRDAAS AG 
45 <212> Type : PRT 

<211> Length : 170 

SequenceName : SEQ ID 23 8 

SequenceDescription : 

50 Sequence 



<213> OrganismName : Neisseria meningitidis Z2491 
<400> PreSequenceString : 

MQARLLIPIL FSVFILSACG TLTGIPSHGG GKRFAVEQEL VAASARAAVK DMDLQALHGR 
55 KVALYIATMG DQGSGSLTGG RYSIDALIRG EYINSPAVRT DYTYPRYETT AETTSGGLTG 
LTTSLSTLNA PALSRTQSDG SGSKSSLGLN IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF 
FLRGIDWSP ANADTDVFIN IDVFGTIRNR TEMHLYNAET LKAQTKLEYF AVDRTNKKLL 
I KPKTNAFEA AYKENYALWM GPYKVSKGIK PTEGLMVDFS DIQPYGNHMG NSAPSVEADN 
SHEGYGYSDE AVRRHRQGQP 
60 <212> Type : PRT 

<211> Length : 320 

SequenceName : SEQ ID 239 
SequenceDescription : 

65 Sequence 



<213> OrganismName : Neisseria meningitidis Z2491 



60 
12 0 
170 



60 
120 
180 
240 
300 
320 
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<400> PreSequenceString : 

MRPIFLSFVL FPILITACST PDKSARWENI GTISNGNIHT YINKDSVRKN GNLMIFQDKK 60 
WTNLKQERF ANTPAYKTAI AEWE IHCNNK TYRLSSLQLF DTKNTEISTQ NYTASSLRPM 12 0 

SILSGTLTEK QYETVCGKKL 140 
5 <212> Type : PRT 

<211> Length : 140 

SequenceName : SEQ ID 24 0 

SequenceDescription : 

10 Sequence 

<213> OrganismName : Neisseria meningitidis Z2491 
<4 00> PreSequenceString : 
. . MNKLFI TAL S ALALSACAGT WEGAKQDTAR NLDKTQAAAE RAAEQTGNAV SKGWDKTKEA 60 
15 VKKGGNAVGR GISHLGGKIE NATE 84 
<212> Type : PRT 
<211> Length : 84 

SequenceName : SEQ ID 241 
SequenceDescription : 

20 

Sequence 



<213> OrganismName : Neisseria meningitidis Z2491 

<4 00> PreSequenceString : 
25 MKLLFIPLVL FVAVEHFYIA WLEMTQIPSE KAAETFKLPY EFMEQNRVQT LFGNQGLYNG 60 

FLGIGLVWSR FAAPDNAVYG ATVLFLGFVL I AAAWGAF S S GNKGILVKQG LPAFLAAAAV 12 0 

LAV 123 

<2 12 > Type : PRT 

<211> Length : 123 
30 SequenceName : SEQ ID 242 

SequenceDescription : 

Sequence 



35 <213> OrganismName : Neisseria meningitidis Z2491 
<4 00> PreSequenceString : 

MASSNVNNEI KDKKLSLWAK RQDGSVKWFC GQPVKRDAAT DADVTADSGN EIDTKHLPST 60 
CRDAASAVCT KTPEYYPNHG EWPKNFVIPA QAGIQVCRHG NLSGKKVSPV LSSRFPLSWE 120 

40 <212> Type PRT 

<211> Length : 120 

SequenceName : SEQ ID 243 
SequenceDescription : 

45 Sequence 

<213> OrganismName : Neisseria meningitidis Z2491 
<400> PreSequenceString : 

MLLAEGQKSA VTEYYLNHGE WPSNNTSAGV ATSTDI KGKY VQSVEVKNGV VTATMASSNV 60 
50 NNEIKGKKLS LWAKRQDGSV KWFCGQPVKR NDTATTNDDV KADTAANGKQ IDTKHLPSTA 120 

STRKSTPN " 12 8 

<212> Type : PRT 

<211> Length : 128 

SequenceName : SEQ ID 244 
55 SequenceDescription : 

Sequence 



<213> OrganismName : Neisseria meningitidis Z2491 

60 <400> PreSequenceString : 

MPIPFKPVLA AAAIAQAFPA FAADPAPQSA QTLNEITVTG THKTQKLGEE KIRRKTLDKL 60 

LVNDEHDLVR YDPGISWEG GRAGSNGFTI RGVDKDRVAI NVDGLAQAES RSSEAFQELF 120 

GAYGNFNANR NTSEPENFSE VTITKGADSL KSGSGALGGA VNYQTKSASD YVSEDKPYHL 180 

GIKGGSVGKN SQKFSSITAA GRLFGLDALL VYTRRFGKET KNRSTEGDIE IKNDGYVYNP 240 

65 TDTGGPSKYL TYVATGVARS QPDPQEWVNK STLFKLGYNF NDQNRIGWIF EDSRTDRFTN 3 00 

ELSNLWTGTT TSAATGDYRH RQDVSYRRRS GVEYKNELEH GPWDSLKLRY DKQRIDMNTW 3 60 

TWDIPKNYDK RGINGEVYHS FRHIRQNTAQ WTADFEKQLD FSKAVWAAQY GLGGGKGDNA 420 
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NSDYSYFAKL YDPKILASNQ AKITMLIENR SKYKFAYWNN AFHLGGNDRF RLNAGIRYDK 480 

NSSSAKDDPK YTTAIRGQIP HLGS ERAHAG FSYGTGFDWR FTKHLHLLAK YSTGFRAPTS 540 

DETWLLFPHP DFYLKANPNL KAEKAKNWEL GLAGSGKAGN FKLSGFKTKY RDFIELTYMG 60 0 

VSSDDKNNPR YAPL SDGTAL VSSPWQNQN RSAAWVKGIE FNGTWNLDSI GLPKGLHTGL 660 

5 NVSYIKGKAT QNNGKETPIN ALSPWTAVYS LGYDAPSKRW GINAYATRTA AKKPSDTVHS 72 0 

NDDLNNPWPY AKHSKAYTLF DLSAYLNIGK QVTLRAAAYN ITNKQYYTWE SLRSIREFGT 78 0 

VNRVDNKTHA GIQRFTSPGR SYNFTIEAKF 810 
<212> Type : PRT 
<211> Length : 810 
10 SequenceName : SEQ ID 245 

SequenceDescription : 

Sequence 



15 <213> OrganismName : Neisseria meningitidis Z2491 
<400> PreSequenceString : 

MKKSLIALTL AALPVAAMAD VTLYGTIKTG VETSRSVEHN GGQWSVETG TGIVDLGSKI 60 
GFKGQEDLGN GLKAIWQVEQ KASIAGTDSG WGNRQSFIGL KGGFGKLRVG RLNSVLKDTG 120 
DINPWDSKSD YLGVNKIAEP EARLISVRYD SPEFAGLSGS VQYALNDNVG RHMSESYHAG 180 

20 FN YKNGGF FV QYGGAYKRHQ DVDDVKIEKY QIHRLVSGYD NDALYASVAV QQQDAKLVED 240 
NSHNSQTEVA ATLAYRFGNV TPRVSYAHGF KGSVDDAKRD NTYDQVWGA EYDFSKRTSA 3 00 

LVSAGWLQEG KGENKFVATA GGVGLRHKF 329 
<212> Type : PRT 
<211> Length : 329 

25 SequenceName : SEQ ID 246 

SequenceDescription : 

Sequence 



30 <213> OrganismName : Neisseria meningitidis Z2491 
<400> PreSequenceString : 

MKTLLLLXPL VLTACGTLTG IPAHGGGKRF AVEQELVAAS SRAAVKEMDL SALKGRKAAL 60 
YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPESATQYSY PAYDTTATTK SDALSSVTTS 120 
TSLLNAPAAA LTKNSGRKGE RSAGLSVNGT GDYRNETLLA NPRDVSFLTN LIQTVFYLRG 180 

35 IEWPPEYAD TD VFVTV DVF GTVRSRTELH LYNAETLKAQ TKLEYFAVDR DSRKLLIAPK 240 
TAAYESQYQE QYALWMGPYS VGKTVKASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKKP 300 
DVGNEVTRRR KGG 313 
<212> Type r PRT 
<211> Length t 313 

40 SequenceName : SEQ ID 247 

SequenceDescription : 

Sequence 



123 



45 <213> OrganismName : Neisseria meningitidis Z2491 
<400> PreSequenceString : 

MNKTLSILPV AILLGGCAAG GGNTFGSLDG GTGMGGSIVK MAVESQCRAE LNKRSEWRLT 60 
ALAMSAEKQA EWENKICACV AQEAPNQLTG NDVMQMLDPS TRNQALAALT AKTVSACFKH 120 
LYR 

50 <212> Type : PRT 

<211> Length : 123 

SequenceName : SEQ ID 248 
SequenceDescription : 

55 Sequence 



<213> OrganismName : Neisseria meningitidis Z2491 
<400> PreSequenceString : 

MNPL I HQAKE SSMQTRILSA VLLAFSTAAF AGGAFTLQFD NPSEDGGFTQ NQILSAPYGF 60 
60 GCSGGNASPA LSWKNPPAGT KSFVLTVYDK DAPTGLGWMH WWADIPADV RRRNA^SLQL 120 

SRCASIADDQ SAAISAVISL QICRIRLTPS YTAKPMPSCC NHANTPQSAA SAALCGTSSS 180 

VSTAAA 186 

<212> Type : PRT 

<211> Length : 186 
65 SequenceName : SEQ ID 249 

SequenceDescription : 
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Sequence 



<213> OrganismName : Neisseria meningitidis Z2491 
<4 00> PreSequenceString : 

5 MNKTLKRRVF RHTALYAAIL MFSHTGGGGG AMAQTRQYAI IMNERNQPEV QWNGSYSIKD 60 

KDRKREYTHH NHQQGGSSVS FNNSDELVSR QSGTAVFGTA TYLPPYGKVS GFDAAALKER 12 0 

NNAVDWIHTT HPGLIGYSYD GWCRSATDC PKLVYKTRFS FDNPDLAKTG GGLDKHTEPS 18 0 

RDNSPIYKLK DHPWLGVSFN LGAEGIAKNG KTINKLVSSF NEKNSNNNLV YTTEGRDISL 24 0 

GNWQRETTAM AYYLNAKLHL LDKKQIQNIT DKTVQLGVLK PSIDVRTRNT GTAGILSYWA 3 00 

10 KWDIKDTGQI PVKLSLTQVK AGRCVNKDNP NKNTKTSSPA LTAPALWFGA GQDGKAEMYS 3 60 

ASVSTYPDSS SSRIFLQNLK RKTDTSRPGR YSLATLNKSD IESREPSFTS RQTVIRLDGG 420 

VQQIKLDRNN TEVTGFNGND GKNDTFGIVS EGSFMPDASE WKKVLLPWTV RAFNYDGRFN 48 0 

TVNKEENNGK PKYSQKYRSR NNGKHERNLG DIWSPIVAV GEYLATSAND GMVHIFKQSG 540 

GDKRS YNLKL SYIPGTMPRK DIESKDSTLA KELRAFAEKG YVGDRYGVDG GFVLR.RI.TDD 60 0 

15 QDKQKHFFMF GAMGLGGRGA YALDLTKADD NDPTKASLFD VKDNGNNGNN GNNRVELGYT 660 

VGTPQIGKTH NGKYAAFLAS GYATKQIDSG ENKTALYVYD LESNNGTLIR KIEVTDGKGG 72 0 

LSSPTLVDKD LDGTVDIAYA GDRGGKMYRF DLSGNNPNSW TVRTIFQGTK PITSAPAISQ 780 

LKDKRWIFG TGSDLS EDDV LSTDEQHIYG IFDNDTNTGT AQEGLGKGLL EQKLSEENKT 840 

LFLTDYKRSD GSGDKGWWK LKDGQRVTVK PTWLRTAFV TIHKYTGNDK CGAETAILGI 900 

20 NTADGGKLTK KSARPIVPAA NSKVAQYSGD KKTSSGKSIP IGCMEKDGGT VCPNGYVYDK 960 

PVNVR YLDE K KTDGFS TTAD GDAGGSGTFK EGKKPARNNR CFSGKGVRTL LMNDLDSLDI 1020 

TGPMCGMKRI SWREVFY 103 7 
<212> Type : PRT 
<211> Length : 1037 

25 SequenceName : SEQ ID 250 

SequenceDescription : 



Sequence 



30 <213> OrganismName : Neisseria meningitidis Z2491 
<400> PreSequenceString : 

MKHPKQTLIA ALLTTAATAA PLPWTSFSI LGDVAKQIGG ERVSIQSLVG ANQDTHAYHM 60 

TSGDIKKIRS AKLVLINGLG LEAADIQRAV KQSKVSYAEA TKGIQPLKAE EEGGHHHDHD 12 0 

HDHDHDHEGH HHDHGEYDPH VWNDPVLMSA YAQNVAEALI KADPEGKVYY QQRLGNYQMQ 180 

35 LKKLHSDAQA AFNAVPAAKR KVLTGHDAFS YMGKRYHXEF XAPQGVSSEA EPSAKQVAAI 240 

IRQIKREGIK AVFTENIKDT RMVDRIAKET GVNVS GKIiYS DALGNAPADT YIGMYRHNIK 3 00 

ALTNAMKQ 3 0 8 
<212> Type : PRT 
<211> Length : 308 

40 SequenceName : SEQ ID 251 

SequenceDescription : 

Sequence 



45 <213> OrganismName : Streptococcus pyogenes MGAS823 2 
<400> PreSequenceString : 

MKKRILSAVL VSGVTLGAAT TVGAEDLiSTK IAKQDSIISN LTTEQKAAQN QVSALQAQVS 60 

SLQSEQDKLT ARNTELEALS KRFEQEIKAL TSQIVARNEK LKNQARSAYK NNETSGYINA 120 

LLNSKSISDV VNRLVAINRA VSANAKLLEQ QKADKVSLEE KQAANQTAIN TIAANMAMAE 180 

50 ENQNTLRTQQ ANLEAATANL ALQLASATED KANLVAQKEA AEKAAAEALA QEQAAKVKAQ 240 

EQAAQQAASV EAAKSAITPA PQATPAAQSS NAIEPAALTA PAAPSARPQT SYDSSNTYPV 3 00 

GQCTWGAKSL APWAGNNWGN GGQWAYSAQA AGYRTGSTPM VGAIAVWNDG GYGHVAVWE 3 60 

VQSASSIRVM ESNYSGRQYI ADHRGWFNPT GVTFIYPH 398 
<212> Type : PRT 

55 <211> Length : 3 98 

SequenceName : SEQ ID 252 
SequenceDescription : 

Sequence 
60 — 

<213> OrganismName : Streptococcus pyogenes MGAS823 2 
<40 0> PreSequenceString : 

MITIKNPKIL KWLKYVLSAI LSLIILVIII GGLLFTFYIS SAPKLSEAQL KSTNSSLVYD 60 

GNNNLIADLG SEKRENVTAD SIPINLVNAI TSIEDKRFFN HRGVDLYRIF GAAFHNLTSQ 12 0 

65 TTQGGSTLDQ QLIKLAYFST NESDQTLKRK AQEVWLALQM ERKYTKQEIL TF Y INKVYMG 18 0 

NGNYGMLTAA KSYYGKDLKD LSYAQLALLA GIPQAPSQYD PYLHPEAAQN RRNWLQQMY 240 

MEKHLTKAEY ETAIATPVAE GLQSLQQRST YPKYMDNYLK QVIEEVKKET NKDIFTAGLK 3 00 
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VYTNIIPDAQ QTLYNIYHSG DYVYYPDQDF QVASTIVDVT NGHVIAQLGG RMQDENVS FG 3 60 

TNQAVLTDRD WGSTMKPITA YAPAIESGVY TSTAQSTNDS VYYWPGTTTQ LFNWDLRYNG 420 

WMTIQAAIML S RNVPAVRAL EAAGLDYARS FLSSLGINYP EMHYSNAISS NNSSSDKKYG 480 

ASSEKMAAAY AAFANGGIYH KPRYVNKVEF SDGTSKTFDE KGKRAMKETT AYMMTDMLKT 540 

5 VLTYGTGTAA AIPGVAQAGK TGTSMYTDEE LAKIGEKYGL YPDYVGTLAP DENFVGFTKR 600 

YAMAVWTGYK NRLTPVYGSS LEIASDVYRS MMTYLTNGYS EDWTMPNGLY RSGGFLYLSG 660 

TYASNTDYTN SVYNNLYSNN TTTASSQTTS DDTSSSNDTS NSTNTDNNGS HPSTDDKKTT 720 

H 721 
<212> Type : PRT 
10 <211> Length : 721 

SequenceName : SEQ ID 253 

SequenceDescription : 



Sequence 
15 

<213> Organi smNarae : Streptococcus pyogenes MGAS8232 
<400> PreSequenceString : 

MIITKKSLFV TSVALSLAPL VTAQAQEWTP RSVTEIKSEL VLVDNVFTYT VKYGDTLSTI 60 
AEAMGIDVHV LGDINHIANI DLIFPDTILT ANYNQHGQAT TLTVQAPASS PASVSHVPSS 12 0 

20 EPLPQASATS QSTVPMAPSA TPSDVPTTPL ASAKPDSFVT ASSELTSSTN DVSTELSSES 18 0 

QKQPEVSQEA VPTPKAAETT EVEPKTDISE DPTSANRPVP NESASEEASS AAPAQAPAEK 240 
EETSQMLTAP AAQKAVADTT SVATSNGLSY APNHAYNPMN AGLQPQ TAAF KEEVASAFGI 3 00 

TSFSGYRPGD PGDHGKGLAI DFMVPVSSTL GDQVAQYAID HMAERGISYV IWKQRFYAPF 3 60 

ASIYGPAYTW NPMPDRGSIT ENHYDHVHVS FNA ' 393 

25 <212> Type : PRT 

<211> Length : 393 

SequenceName : SEQ ID 254 
SequenceDescription : 

30 Sequence 



<213> OrganismName r Streptococcus pyogenes MGAS8232 
<400> PreSequenceString r 

MKKKILLMMS LISVFFAWQL TQAKQVLAEG KVKWTTFYP VYEFTKGVIG NDGDVSMLMK 60 
35 AGTEPHDFEP STKDIKKIQD ADAFVYMDDN METWVSDVKK SLTSKKVTIV KGTGNMLLVA 120 
GAGHDHHHED ADKKHEHNKH SEEGHNHAFD PHVWLSPYRS ITWENIRDS LSKAYPEKAE 180 
NFKANAATYI EKLKELDKDY TAALSDAKQK SFVTQHAAFG YMALDYGLNQ ISINGVTPDA 240 
EPSAKRIATL SKYVKKYGIK YIYFEENASS KVAKTLAKEA GVKAAVLS PL EGLTKKEMKA 3 00 

GQDYFTVMRK NLETLRLTTD VAGKEILPEK DTTKTVYNGY FKDKEVKDRQ LSDWSGSWQS 3 60 

40 VYPYLQDGTL DQVWDYKAKK SKGKMTAAEY KDYYTTGYKT DVEQIKINGK KKTMTFVRNG 420 
EKKTFTYTYA GKEILTYPKG NRGVRFMFEA KEPNAGEFKY VQFSDHAIAP EKAEHFHLYW 480 
GGDSQEKLHK ELEHWPTYYG SDLSGREIAQ EINAH 515 
<212> Type : PRT 
<211> Length : 515 
45 SequenceName : SEQ ID 255 

SequenceDescription : 

Sequence 



50 <213> Organi smName : Streptococcus pyogenes MGAS823 2 
<40 0> PreSequenceString : 

MKKFHRFLVS GVILLGFNGL VPTMPSTLIS QQENLVHAAV LGDNYPSKWK KGNGIDSWNM 60 
YIRQCTSFAA FRLSSANGFQ LPKGYGNACT WGHIAKNQGY PVNKTPSIGA IAWFDKNAYQ 120 
SNAAYDHVAW VAD I RGDTVT IEEYNYNAGQ GPERYHKRQI PKSQVSGYIH FKDLSSQTSH 180 

55 SYPRQLKHIS QASFDPSGTY HFTTRLPVKG QTSIDSPDLA YYEAGQSVYY DKWTAGGYT 240 
WLSYLSFSGN RRYIPIKEPA QSWQNDNTK PSIKVGDTVT FPGVFRVDQL VNNLIVNKEL 3 00 

AGGDPTPLNW IDPTPLDETD NQGKVLGNQI LRVGEYFTVT GSYKVLKIDQ PSNGIYVQIG 360 
SRGTWVNADK ANKL 374 
<212> Type : PRT 

60 <211> Length : 374 

SequenceName : SEQ ID 256 
SequenceDescription : 

Sequence 

65 

<213> Organi smName : Streptococcus pyogenes MGAS8232 
<400> PreSequenceString : 
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MLKFTSNILA TSVAETTQVA PGGCCCCCTT CCFSIATGSG 

<212> Type : PRT 
<211> Length : 53 

SequenceName : SEQ ID 257 

SequenceDe script ion : 

Sequence 



<213> OrganismName : Streptococcus pyogenes 
<400> PreSequenceString : 

MGES YSVEAV LTAVDKTFGK TLQSAIRSIE GLEKRSTGFS 
QAISAMTRTV SSGLGSMLGE MNSSAKAWKT FDANLADIGF 
YS AS DMAS T Y AQLAAVGVKD TGKLVKAFGG LAASAENPKQ 
QDFRIMLEQT PAGMAKVAKS MGKNLDELVA DIQAGRVKTS 
EFKTVDQAID GMREGLSNKL QPAFEKVNQF GIRAIEAIGK 
INIDKIVSNI SSAVSSVTSK VKEFWDGFKQ TGAISAFSGA 
KTFGATVGGI VKHVSNFAKA VSDVLGKMDP GRLRSWIATF 
SFLDKIGSKF GLFGNKAKEG TDKASNGARR SGGIISQIFS 
GVGIKTALSG IPPYH 
<212> Type : PRT 
<211> Length : 495 

SequenceName : SEQ ID 258 

SequenceDescription : 

Sequence 



<213> OrganismName : Streptococcus pyogenes MGAS8232 
<400> PreSequenceString : 

MKKGFFLMVM WSLVMIAGC DKSANPKQPT QGMSWTSFY PMYAMTKEVS GDLNDVRMIQ 60 
SGAGIHS FEP SVNDVAAIYD ADLFVYHSHT LEAWARDLDP NLKKSKVDVF EAS KPLTLDR 12 0 

VKGLEDMEVT QGIDPATLYD PHTWTDPVLA GEEAVN I AKE LGRLDPKHKD SYTKNAKAFK 180 
KEAEQLTEEY TQKFKKVRSK TFVTQHTAFS YLAKRFGLKQ LGISGISPEQ EPSPRQLKEI 240 
QDFVKEYNVK TIFAEDNVNP KIAHAIAKST GAKVKTLSPL EAAPSGNKTY LENLRANLEV 300 
LYQQLK 3 06 

<212> Type : PRT 
<211> Length : 3 06 

SequenceName : SEQ ID 259 

SequenceDescription : 

Sequence 



<213> OrganismName : Streptococcus pyogenes MGAS8232 
<400> PreSequenceString : 

MEKKQRFSLR KYKSGTFSVL IGSVFLMMTT TVAADELSTM SEPTITNHTQ QQAQHLTNTE 60 

LSSAESKSQD TSQITPKTNR EKEQPQGLVS EPTTTELADT DAAPMANTGP DATQKSASLP 12 0 

PVNTDVHDWV KTKGAWDKGY KGQGKWAVI DTGIDPAHQS MRISDVSTAK VKSKEDMLAR 18 0 

QKAAGINYGS WINDKWFAH NYVENSDNIK ENQFEDFDED WENFE FDAEA EPKAIKKHKI 24 0 

YRPQSTQAPK ETVIKTEETD GSHDIDWTQT DDDTKYESHG MHVTGIVAGN SKEAAATGER 3 00 

FLGIAPEAQV MFMRVFANDV MGSAESLFIK A I EDAVALGA DVINLSLGTA NGAQLSGSKP 3 60 

LMEAIEKAKK AGVS WVAAG NERVYGSDHD DPLAINPDYG LVGSPSTGRT PTSVAAINSK 420 

WV I QRLMTVK ELENRADLNH GKAIYSESVD FKNIKDSLGY DKSHQFAYVK ESTDAGYKAQ 48 0 

DVKDKIALIE RDPNKTYDEM IALAKKHGAL GVL I FNNKPG QSNRSMRLTA NGMGIPSAFI 540 

SHEFGKAMSQ LNGNGTGSLE FDSWSKAPS QKGNEMNHFS NWGLTSDGYL KPD I TAP GGD 600 

IYSTYNDNHY GSQTGTSMAS PQIAGASLLV KQYLEKTQPN LPKEKIADIV KNLLMSNAQI 660 

HVNPETKTTT SPRQQGAGLL NIDGAVTSGL YVTGKDNYGS ISLGNITDTM TFDVTVHNLS 720 

NKDKTLRYDT ELLTDHVDPQ KGRFTLTSRS LKTYQGGEVT VPANGKVTVR VTMDVSQFTK 78 0 

ELTKQMSNGY YLEGFVRFRD SQDDQLNRVN IPFVGFKGQF ENLAVAEESI YRLKSQGKTG 840 

FYFDESGPKD D I YVGKHFTG LVTLGSETNV STKTISDNGL HTLGTFKNAD GKFILEKNAQ 9 00 

GNPVLAISPN GDNNQDFAAF KGVFLRKYQG LKASVYHASD KEHKNPLWVS PESFKGDKNF 960 

NSDIRFAKST TLLGTAFSGK SLTGAELPDG YYHYWSYYP DWGAKRQEM TFDMILDRQK 102 0 

PVLSQATFDP ETNRFKPEPL KDRGLAGVRK DSVFYLERKD NKPYTVTIND SYKYVSVEDN 1080 

KTFVERQADG SFILPLDKAK LGDFYYMVED FAGNVAIAKL GDHLPQTLGK TPIKLKLTDG 114 0 

NYQTKETLKD NLEMTQSDTG LVTNQAQLAV VHRNQPQSQL TKMNQDFFIS PNEDGNKDFV 1200 

AFKGLKNNVY NDLTVNVYAK DDHQKQTPIW SSQAGASASA IESTAWYGIT ARGSKVMPGD 1260 

YQYWTYRDE HGKEHQKQYT ISVNDKKPMI TQGRFDTING VDHFTPDKTK ALGSSGIVRE 1320 

EVFYLAKKNG RKFDVTEGKD GITVSDNKMY IPKNPDGSYT ISKRDGVTLS DYYYLVEDRA 1380 



NSQGGSGSYT PGK 



MGAS823 2 



SVSQKASSMF 


KSMLGANLAG 


60 


GKKQI LAVKT 


AMQDYATKTI 


120 


AMKSISQQMT 


QAVGRPTVAW 


180 


DFLEAVKKAG 


NDKSFQKMAT 


240 


QLDKVDFSKF 


ASNLGKFLEG 


300 


LQSVWGALKN 


VASAMSGGNW 


360 


AAVAGGFKL F 


EKLTGQSVIG 


420 


GLGNIVKSAG 


TAISTAAKGI 


480 
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GNVS FATLRD LKAVGKDKAV VNFGLDLPVP EDKQIVNFTY L VRDADGKP n ENLEYYNNSG 1440 

NSLILPYGKY TVELLTYDTN AAKLESDKIV SFTLSADNNF QQVTFKMTML* ATSQITAHFD 15 0 0 

HLLPEGSRVS LKTAQGQLIP LEQSLYVPKA YGKTVQEGTY EWVSLPKGY"* RIEGNTKVNT 1560 

LPNEVHELSL RLVKVGDASD S TGDHKVMS K NNSQALTAFA TPTKTTTSAT" AKALPSAGEK 162 0 

5 MGLKLRIVGL VLLGLTCVFS RKKSTKD 1647 



<212> Type : PRT 

<211> Length : 1647 

SequenceName : SEQ ID 2 60 
SequenceDe script ion : 

10 

Sequence 



<213> Organi smName : Treponema pallidum subsp. pallidum str. Nichols 
<400>, PreSequenceString .... . . . 

15 MMRSLFSGVS GMQNHQTRMD VIGNNVANVN TTGFKRGRVN FQDLISQQLS - AAARPNEEVG 60 
GVNPKEVGLG VLIASIDTVH TQGALQTTGI NTDVSIQGSG FFVLKSGEKT7 FFTRAGAFGV 12 0 

DNAGTLVNPA NGMRVQGWMA QDVAGERLIN SSAQTQDLVI PIGQKIDAQQ TSTVHYACNL 180 
DKRLPELAAD ANEADVRKST WTTDFQVYDS FGQQHTLQIN FSRVPGTNNC^ WQATVAVDPG 240 
TEVDTQTRVG VGTSDGAANT FIVNFDNFGH LASVTDTAGN VTGPTGQVLL. EASYDWGAN 3 00 

20 PDDAGQVTRH AFTLNLGEIG TARNTITQFA ERSTTKAYRQ DGYAMG YL EKT FKIDQSGVIT 3 60 

GVYSNGVSQD IGQLALAGFA NQGGLEKAGE NTYVQSNNSG IANISTSGVM GKGKLIAGTL 420 
EMSNVDLTDQ FTDMI ITQKG FQAGAKTIQT SDTMLDTVLS LKR 463 
<212> Type : PRT 
<211> Length : 463 

25 SequenceNarae : SEQ ID 261 

SequenceDe script ion : 

Sequence 



30 <213> OrganismName : Treponema pallidum subsp. palliduxm str. Nichols 



<400> PreSequenceString : 

MGCMRWGSVL CVWGVGASG GVLGQEFSPK LTGSATLEWG ISYGKGVGSI^ GQAPGAVMGT 60 

GPYNLKHGFR TTNTVGVSFP LVMRTTHTRR GQHPALYAEL KVADLQADLS QGKAGFAVKR 120 

KGKVEATLHC YGAYLTIGKN PTFLTNFARL WKPWVTAQYQ EDAVQYAPG^ GGLGGKVGYR 180 

35 AQDIGGSGVS LDVGFLSFAS NGAWDSTDPT HSKYGFGADL KLMYARAGHE> LCTVELASNV 240 

TLEDGYLIGA QKDANNQNKD KLLWNVGGRL TLEPGAGFRF SFALDAGNQE2 QSAQDFQNRT 3 00 

QRAQSELTAL SNNLFQGESQ KQEAWVTQW QQATQTVTAG VRSALESRGT? TYINALEAVQ 3 60 

PNPAKPTGKV VQNLHTPQGS PPNLPPLPAL PAFSLMGQVL LQYDAEQWK1 GFEQVQTQIV 42 0 

TEINQKVQAA VAKNNANMQA VGGS LGDTAR MVGEALIKQQ LSRKQNSILT7 MVSVQDEVKQ 480 

40 DLADLVPMMR TEITAFFASV QQHITEEVKK KTDALNAGQQ IRQAIQNLRA*. SAWRAFLMGV 540 

SAVCLYLDTY NVAFDALFTA QWKWLSSGIY FATAPANVFG TRVLDNT I AS CGDFAGFLKL 600 

ETKSGDPYTH LLTGLDAGVE TRVYI PLTHD LYKNNNGNPL PSGGSSGHIS LPWGKAWCS 660 

YRIPVQDYGW VKPSVTVHAS TNRAHLNAPA AGGAVGATYL TKEYCAQLRZ^ GISASLIEKT 720 

VFSLDWEQGM LSDVPYLLVS ECLTQGIGRI VCGVTLSW 758 

45 <212> Type : PRT 

<211> Length : 758 

SequenceName : SEQ ID 2 62 
SequenceDescription : 



50 Sequence 



<213> OrganismName : Treponema pallidum subsp. pallidL3.m str. Nichols 



<4 00> PreSequenceString : 

MGRQVMQAGV LAGMVCAASG YAGVLTPQVS GTAQLQWGIA FQKNPRTGPG. KHTHGFRTTN 60 

55 SLTISLPLVS KHTHTRRGEA RSGVWAQLQL KD LAVE LAS S KSSTALSFTEC PTASFQATLH 120 

CYGAYLTVGT SPSCWNFAQ LWKP FVTRAY S EKDTR YAP G FSGSGAKLGVT QAHNVGNSGV 180 

DVDIGFLSFL SNGAWDSTDT THSKYGFGAD ATLSYGVDRQ RLLTLELAGIS3* ATLDQNYVKG 24 0 

TEDS KNENKT ALLWGVGGRL TLEPGAGFRF SFALDAGNQH QSNAHAQTQE RAILKAREVF 3 00 

RRVEGKLVQN LPNIMMPPGI TEQTTLIEMV GLAALIAEGT LGSAIQTVL^ AGALAALVSQ 3 60 

60 LVPNIEQGVR DVFRSSDPRV VTAKLLAFLE RAPMNALNID ALLRMQWKWELj SSGIYFATAG 420 

TNIFGKRVFA TTRAHYFDFA GFLKLETKSG DPYTHLLTGL NAGVEARVYH PLTYIRYRNN 4 80 

GGYELNGAVP PGTINMPILG KAWCSYRIPL GSHAWLAPHT SVLGTTNRF1S3" IINPAGNLLN 54 0 

ERALQYQVGL TFSPFEKVEL SAQWEQGVLA DAPYMGIAES IWSERHFGTHj VCGMKVTW 59 8 



65 <212> Type : PRT 

<211> Length : 598 

SequenceName : SEQ ID 2 63 
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SequenceDescription : 



Sequence 

5 <213> OrganismName : Treponema pallidum subsp . pallidum str. Nichols 
<400> PreSequenceString : 

MGRQVMQAGV LAGMVCAASG YAGVLTPQVS GTAQLQWGIA FQKNPRTGPG KHTHGFRTTN 60 

SLTISLPLVS KHTHTRRGEA RSGVWAQLQL KD LAVE LAS S KSSTALSFTK PTASFQATLH 12 0 

CYGAYLTVGT SPSCWNFAQ LWKPFVTRAY SEKDTRYAPG FSGSGAKLGY QAHNVGNSGV 180 

10 DVDIGFLSFL SNGAWD S TDT THS KYGFGAD ATLSYGVDRQ RLLTLELAGN ATLDQNYVKG 240 

TEDSKNENKT ALLWGVGGRL TLEP GAGFRF SFALDAGNQH QSNAHAQTQE RAILKAREVF 3 00 

RRVEGKLVQN LPNIMMPPGI TEQTTLIEMV GLAALIAEGT LGSAIQTVLA AGALAALVSQ 3 60 

LVPNIEQGVR DVFRSSDPRV VTAKLLAFLE RAPMNALNID ALLRMQWKWL SSGIYFATAG 420 
TNIFGKRVFA TTRAHYFDFA GFLKLETKSG DPYTHLLTGL NAGVEARVYI PLTYIRYRNN . . 480 

15 GGYELNGAVP PGTINMPILG KAWCSYRIPL GSHAWLAPHT SVLGTTNRFN IINPAGNLLN ' 54 0 

ERALQYQVGL TFSPFEKVEL SAQWEQGVLA DAPYMGIAES IWSERHFGTIi VCGMKVTW 598 



<212> Type : PRT 
<211> Length : 598 
20 SequenceName : SEQ ID 264 

SequenceDescription : 



Sequence 



25 <213> OrganismName : SARS coronavirus Frankfurt 1 
<40 0> PreSequenceString : 

MFIFLLFLTL TSGSDLDRCT TFDDVQAPNY TQHTSSMRGV YYPDEIFRSD TLYLTQDLFL 60 

PFYSNVTGFH TINHTFGNPV I PFKDGI YFA ATEKSNWRG WVFGSTMNNK SQSVIIINNS 120 

TNWIRACNF ELCDNPFFAV SKPMGTQTHT MIFDNAFNCT FEYISDAFSL DVSEKSGNFK 18 0 

30 HLREFVFKNK DGFLYVYKGY QPIDWRDLP SGFNTLKPIF KLPLGINITN FRAILTAFSP 240 

AQDIWGTSAA AYFVGYLKPT TFMLKYDENG TITDAVDCSQ NPLAELKCSV KSFEIDKGIY 3 00 

QTSNFRWPS GDWRFPNIT NLCPFGEVFN ATKFPSVYAW ERKKI SNCVA DYSVLYNSTF 3 60 

FSTFKCYGVS ATKLNDLCFS NVYADSFWK GDDVRQIAPG QTGVIADYNY KLPDDFMGCV 42 0 

LAWNTRNIDA TSTGNYNYKY RYLRHGKLRP FERDISNVPF SPDGKPCTPP ALNCYWPLND 480 

35 YGFYTTTGIG YQPYRWVLS FELLNAPATV CGPKLSTDLI KNQCVNFNFN GLTGTGVL TP 540 

SSKRFQPFQQ FGRDVSDFTD SVRDPKTSEI LDISPCSFGG VSVITPGTNA SSEVAVLYQD 600 

VNCTDVSTAI HADQLTPAWR IYSTGNNVFQ TQAGCLIGAE HVDTSYECDI PIGAGICASY 660 

HTVSLIiRS TS QKS I VAYTMS LGADSSIAYS NNTIAIPTNF SISITTEVMP VSMAKTSVDC 720 

NMYICGDSTE CANLLLQYGS FCTQLNRALS GIAAEQDRNT REVFAQVKQM YKTPTLKYFG 780 

40 GFNFSQILPD PLECPTKRSFI EDLLFNKVTIi ADAGFMKQYG ECLGDINARD LICAQKFNGL 840 

TVLPPLLTDD M I AAYTAALV SGTATAGWTF GAGAALQ I P F AMQMAYRFNG I GVTQNVL YE 900 

NQKQIANQFN KAISQIQESL TTTSTALGKL QDWNQNAQA LNTLVKQLSS NFGAISSVLN 960 

DILSRLDKVE AEVQIDRLIT GRLQSLQTYV TQQLIRAAEI RASANLAATK MSECVLGQSK 1020 

RVDF CGKGYH LMSFPQAAPH GWFLHVTYV PSQERNFTTA PAI CHEGKAY FPREGVFVFN 1080 

45 GTSWFITQRN FFSPQIITTD NTFVSGNCDV VIGIINNTVY DPLQPELDSF KEELDKYFKN 1140 

HTSPDVDFGD ISGINASWN IQKEIDRLNE VAKNLNESLI DLQELGKYEQ YIKWPWYVWL 12 0 0 

GFIAGLIAIV MVTILLCCMT SCCSCLKGAC SCGSCCKFDE DDSEPVLKGV KLHYT 1255 



<212> Type : PRT 
50 <211> Length : 1255 

SequenceName : SEQ ID 2 65 
SequenceDescription : 



to Sequence 

55 

<213> OrganismName : SARS coronavirus HSR 1 
<40 0> PreSequenceString : 

MFIFLLFLTL TSGSDLDRCT TFDDVQAPNY TQHTSSMRGV YYPDEIFRSD TLYLTQDLFL 60 

PFYSNVTGFH TINHTFGNPV I PFKDGI YFA ATEKSNWRG WVFGSTMNNK SQSVIIINNS 120 

60 TNWIRACNF ELCDNPFFAV SKPMGTQTHT M I FDNAFNCT FEYISDAFSL DVSEKSGNFK 180 

HLREFVFKNK DGFLYVYKGY QPIDWRDLP SGFNTLKPIF KLPLGINITN FRAILTAFSP 240 

AQDIWGTSAA AYFVGYLKPT TFMLKYDENG TITDAVDCSQ NPLAELKCSV KSFEIDKGIY 3 00 

QTSNFRWPS GDWRFPNIT NLCPFGEVFN ATKFPSVYAW ERKKI SNCVA DYSVLYNSTF 360 

FSTFKCYGVS ATKLNDLCFS NVYADSFWK GDDVRQIAPG QTGVIADYNY KLPDDFMGCV 420 

65 LAWNTRN I DA TSTGNYNYKY RYLRHGKLRP FERDISNVPF SPDGKPCTPP ALNCYWPLND 480 

YGFYTTTGIG YQPYRWVLS FELLNAPATV CGPKLSTDLI KNQCVNFNFN GLTGTGVLTP 540 

SSKRFQPFQQ FGRDVSDFTD SVRDPKTSEI LDISPCSFGG VSVITPGTNA SSEVAVLYQD 600 
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VNCTDVSTAI HADQLTPAWR IYSTGNNVFQ TQAGCLIGAE HVDTSYECDI PIGAGICASY 660 

HTVSLLRSTS QKSIVAYTMS LGADSSIAYS NNTIAIPTNF SISITTEVMP VSMAKTSVDC 72 0 

NMYICGDSTE CANLLLQYGS FCTQLNRALS GIAAEQDRNT REVFAQVKQM YKTPTLKYFG 780 

GFNFSQILPD PLKPTKRSFI EDLLFNKVTL ADAGFMKQYG ECLGDINARD LICAQKFNGL 84 0 

5 TVLPPLLTDD MIAAYTAALV SGTATAGWTF GAGAALQIPF AMQMAYRFNG IGVTQNVLYE 9 00 

NQKQIANQFN KAISQIQESL TTTSTALGKL QDWNQNAQA LNTLVKQLSS NFGAISSVLN 960 

DILSRLDKVE AEVQIDRLIT GRLQSLQTYV TQQLIRAAEI RASANLAATK MSECVLGQSK 1020 

RVDF CGKGYH LMSFPQAAPH GWFLHVTYV PSQERNFTTA PAICHEGKAY FPREGVFVFN 10 8 0 

GTSWFITQRN FFSPQIITTD NTFVS GNCDV VIGIINNTVY DPLQPELDSF KEELDKYFKN 1140 

10 HTSPDVDLGD ISGINASWN IQKEIDRLNE VAKNLNESLI DLQELGKYEQ YIKWPWYVWL 1200 

GFIAGLIAIV MVTILLCCMT SCCSCLKGAC SCGSCCKFDE DDSEPVLKGV KLHYT ! 12 55 

<212> Type : PRT 
<211> Length - 1255, 
15 SequenceName : SEQ ID 266 

SequenceDescription : 

Sequence 



20 <213> OrganismName : SARS coronavirus ZJ01 
<400> PreSequenceString : 

MFIFLLFLTL TSGSDLDRCT TFDDVQAPNY TQHTS SMRGV YYPDEIFRSD TLYLTQDLFL 60 

P FYS NVTGFH TINHTFGNPV I PFKDGI YFA ATEKSNWRG WVFGSTMNNK SQSVIIINNS 12 0 

TNWIRACNF ELCDNPFFAV SKPMGTQTHT MIFDNAFNCT FEYISDAFSL DVSEKSGNFK 18 0 

25 HLREFVFKNK DGFLYVYKGY QPIDWRDLP SGFNTLKPIF KLPLGINITN FRAILTAFSP 240 

AQDIWGTSAA AYFVGYLKPT TFMLKYDENG TITDAVDCSQ NPLAELKCSV KSFEIDKGIY 3 00 

QTSNFRWPS GDWRFPNIT NLCPFGEVFN ATKFPSVYAW ERKKI SNCVA DYSVLYNSTF 3 60 

FSTFKCYGVS ATKLNDLCFS NVYADSFWK GDDVRQIAPG QTGVIADYNY KLPDDFMGCV 42 0 

LAWNTRNXDA TSTGNYNYKY RYLRHGKLRP FERDISNVPF SPDGKPCTPP ALNCYWPLND 48 0 

30 YGFYTTTGIG YQPYRWVLS FELLNAPATV CGP KL S TDL I KNQCVNFNFN GLTGTGVLTP 540 

SSKRFQPFQQ FGRDVSDFTD SVRDPKTSEX LDXSPCSFGG VSVITPGTNA SSEVAVLYQD 60 0 

VNCTDVSTAI HADQLTPAWR IYSTGNNVFQ TQAGCLIGAE HVDTSYECDI PIGAGICASY 660 

HTVSLLRSTS QKSIVAYTMS LGADS S IAYS NNTIAIPTNF SISITTEVMP VSMAKTSVDC 720 

NMYICGDSTE CANLLLQYGS FCTQLNRALS GIAAEQDRNT REVFAQVKQM YKTPTLKYFG 78 0 

35 GFNFSQILPD PLKPTKRSFI EDLLFNKVTL ADAGFMKQYG ECLGDINARD LICAQKFNGL 840 

TVLPPLLTDD MIAAYTAALV SGTATAGWTF GAGAALQIPF AMQMAYRFNG IGVTQNVLYE 900 

NQKQIANQFN KAISQIQESL TTTSTALGKL QDWNQNAQA LNTLVKQLSS NFGAISSVLN 960 

DILSRLDKVE AEVQIDRLIT GRLQSLQTYV TQQLIRAAEI RASANLAATK MSECVLGQSK 1020 

RVDF CGKGYH LMSFPQAAPH GWFLHVTYV PSQERNFTTA PAICHEGKAY FPREGVFVFN 1080 

40 GTSWFITQRN FFSPQIITTD NTFVS GNCDV VIGIINNTVY DPLQPELDSF KEELDKYFKN 1140 

HTSPDVDLGD ISGINASWN IQKEIDRLNE VAKNLNESLI DLQELGKYEQ YIKWPWYVWL 12 0 0 

GFIAGLIAIV MVTILLCCMT SCCSCLKGAC SCGSCCKFDE DDSEPVLKGV KLHYT 1255 

<212> Type : PRT 
45 <211> Length : 1255 

SequenceName : SEQ ID 267 
SequenceDescription : 

Sequence 
50 

<213> OrganismName : SARS coronavirus TW1 
<400> PreSequenceString : 

MFIFLLFLTL TSGSDLDRCT TFDDVQAPNY TQHTS SMRGV YYPDEIFRSD TLYLTQDLFL 60 

PFYS NVTGFH T INHTFGNP V I PFKDGI YFA ATEKSNWRG WVFGSTMNNK SQSVIIINNS 120 

55 TNWIRACNF ELCDNPFFAV SKPMGTQTHT MIFDNAFNCT FEYISDAFSL DVSEKSGNFK 180 

HLREFVFKNK DGFLYVYKGY QPIDWRDLP SGFNTLKPIF KLPLGINITN FRAILTAFSP 240 

AQDIWGTSAA AYFVGYLKPT TFMLKYDENG TITDAVDCSQ NPLAELKCSV KSFEIDKGIY 3 00 

QTSNFRWPS GDWRFPNIT NLCPFGEVFN ATKFPSVYAW ERKKI SNCVA DYSVLYNSTF 3 60 

FSTFKCYGVS ATKLNDLCFS NVYADSFWK GDDVRQIAPG QTGVIADYNY KLPDDFMGCV 42 0 

60 LAWNTRNIDA TSTGNYNYKY RYLRHGKLRP FERDISNVPF SPDGKPCTPP ALNCYWPLND 4 80 

YGFYTTTGIG YQPYRVWLS FELLNAPATV CGPKLSTDLI KNQCVNFNFN GLTGTGVLTP 540 

SSKRFQPFQQ FGRDVSDFTD SVRDPKTSEI LDISPCSFGG VSVITPGTNA SSEVAVLYQD 600 

VNCTDVSTAI HADQLTPAWR IYSTGNNVFQ TQAGCLIGAE HVDTSYECDI PIGAGICASY 660 

HTVSLLRSTS QKSIVAYTMS LGADS S IAYS NNTIAIPTNF SISITTEVMP VSMAKTSVDC 72 0 

65 NMYICGDSTE CANLLLQYGS FCTQLNRALS GIAAEQDRNT REVFAQVKQM YKTPTLKYFG 780 

GFNFSQILPD PLKPTKRSFI EDLLFNKVTL ADAGFMKQYG ECLGDINARD LICAQKFNGL 840 

TVLPPLLTDD MIAAYTAALV SGTATAGWTF GAGAALQIPF AMQMAYRFNG IGVTQNVLYE 900 



i 
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NQKQIANQFN KAISQIQESL TTTSTALGKL QDWNQNAQA LNTLVKQLSS NFGAISSVLN 960 

DILSRLDKVE AEVQIDRLIT GRLQSLQTYV TQQLIRAAEI RASANLAATK MSECVLGQSK 1020 

RVDFCGKGYH LMSFPQAAPH GWFLHVTYV PSQERNFTTA PAICHEGKAY FPREGVFVFN 1080 

GTSWFITQRN FFSPQIITTD NTFVSGNCDV VIGIINNTVY DPLQPELDSF KEELDKYFKN 1140 

5 HTSPDVDLGD ISGINASWN IQKEIDRLNE VAKNLNESLI DLQELGKYEQ YIKWPWYVWL 1200 

GFIAGLIAIV MVTILLCCMT SCCSCLKGAC SCGSCCKFDE DDSEPVLKGV KLHYT 1255 

<212> Type : PRT 
<211> Length : 1255 
10 SequenceName : SEQ ID 268 

SequenceDe script ion : 

Sequence 

. £ den Mf «V ^ mm t 

15 <213> OrganismName : SARS coronavirus CUHK-SulO 
<400> PreSequenceString : 

MFIFLLFLTL TSGSDLDRCT TFDDVQAPNY TQHTSSMRGV YYPDEIFRSD TLYLTQDLFL 60 

PFYSNVTGFH TINHTFGNPV IPFKDGIYFA ATEKSNWRG WVFGSTMNNK SQSVIIINNS 120 

TNWIRACNF ELCDNPFFAV SKPMGTQTHT MI FDNAFNCT FEYISDAFSL DVSEKSGNFK 18 0 

20 HLREFVFKNK DGFLYVYKGY QPIDWRDLP SGFNTLKPIF KLPLGINITN FRAILTAFSP 240 

AQDIWGTSAA AYFVGYLKPT TFMLKYDENG TITDAVDCSQ NPLAELKCSV KSFEIDKGIY 3 00 

QTSNFRWPS GDWRFPNIT NLCPFGEVFN AT KF P S V YAW ERKKISNCVA DYSVLYNSTF 360 

FSTFKCYGVS ATKLNDLCFS NVYADS FWK GDDVRQIAPG QTGVIADYNY KLPDD FMG CV 420 

LAWNTRNIDA TSTGNYNYKY RYLRHGKLRP FERDISNVPF SPDGKPCTPP ALNCYWPLND 480 

25 YGFYTTTGIG YQPYRVWLS FELLNAPATV CGPKLSTDLI KNQCVNFNFN GLTGTGVLTP 540 

SSKRFQPFQQ FGRDVSDFTD SVRDPKTSEI LDISPGSFGG VSVITPGTNA SSEVAVLYQD 600 

VNCTDVSTAI HADQLTPAWR IYSTGNNVFQ TQAGCLIGAE HVDTSYECDI PIGAGICASY 660 

HTVSLLRSTS QKSIVAYTMS LGADSSIAYS NNTIAIPTNF SISITTEVMP VSMAKTSVDC 720 

NMYICGDSTE CANLLLQYGS FCTQLNRALS GIAAEQDRNT REVFAQVKQM YKTPTLKYFG 780 

30 GFNFSQILPD PLKPTKRSFI EDLL FNKVTL ADAGFMKQYG ECLGD I NARD LICAQKFlSrGL 840 

TVLPPLLTDD MIAAYTAALV SGTATAGWTF GAGAALQIPF AMQMAYRFNG IGVTQNVLYE 900 

NQKQIANQFN KAISQIQESL TTTSTALGKL QDWNQNAQA LNTLVKQLSS NFGAISSVLN 960 

DILSRLDKVE AEVQIDRLIT GRLQSLQTYV TQQLIRAAEI RASANLAATK MSECVLGQSK 1020 

RVDFCGKGYH LMSFPQAAPH GWFLHVTYV PSQERNFTTA PAI CHEGKAY FPREGVFVFN 1080 

35 GTSWFITQRN FFSPQIITTD NTFVSGNCDV VIGIINNTVY DPLQPELDSF KEELDKYFKN 1140 

HTSPDVDLGD ISGINASWN IQKEIDRLNE VAKNLNESLI DLQELGKYEQ YIKWPWYVWL 12 00 

GFIAGLIAIV MVTILLCCMT SCCSCLKGAC SCGSCCKFDE DDSEPVLKGV KLHYT 1255 

<212> Type : PRT 
40 <211> Length : 1255 

SequenceName z SEQ ID 269 
SequenceDe script ion : 

Sequence 
45 

<213> OrganismName : SARS coronavirus Urbani 
<400> PreSequenceString : 

MFIFLLFLTL TSGSDLDRCT TFDDVQAPNY TQHTSSMRGV YYPDEIFRSD TLYLTQDLFL 60 

PFYSNVTGFH TINHTFGNPV IPFKDGIYFA ATEKSNWRG WVFGSTMNNK SQSVIIINNS 120 

50 TNWIRACNF ELCDNPFFAV SKPMGTQTHT M I FDNAFNCT FEYISDAFSL DVSEKSGNFK 180 

HLREFVFKNK DGFLYVYKGY QPIDWRDLP SGFNTLKPIF KLPLGINITN FRAILTAFSP 240 

AQDIWGTSAA AYFVGYLKPT TFMLKYDENG TITDAVDCSQ NPLAELKCSV KSFEIDKGIY 3 00 

QTSNFRWPS GDWRFPNIT NLCPFGEVFN ATKFPSVYAW ERKKISNCVA DYSVLYNSTF 3 60 

FSTFKCYGVS ATKLNDLCFS NVYADS FWK GDDVRQIAPG QTGVIADYNY KLPDDFMGCV 420 

55 LAWNTRNIDA TSTGNYNYKY RYLRHGKLRP FERDISNVPF SPDGKPCTPP ALNCYWPLND 480 

YGFYTTTGIG YQPYRVWLS FELLNAPATV CGPKLSTDLI KNQCVNFNFN GLTGTGVLTP 540 

SSKRFQPFQQ FGRDVSDFTD SVRDPKTSEI LDISPCSFGG VSVITPGTNA SSEVAVLYQD 600 

VNCTDVSTAI HADQLTPAWR I YSTGNNVFQ TQAGCLIGAE HVDTSYECDI PIGAGICASY 660 

HTVSLLRSTS QKSIVAYTMS LGADSSIAYS NNTIAIPTNF SISITTEVMP VSMAKTSVDC 720 

60 NMYICGDSTE CANLLLQYGS FCTQLNRALS GIAAEQDRNT REVFAQVKQM YKTPTLKYFG 780 

GFNFSQILPD PLKPTKRSFI EDLLFNKVTL ADAGFMKQYG ECLGDINARD L I CAQKFNGL 840 

TVLPPLLTDD MIAAYTAALV SGTATAGWTF GAGAALQIPF AMQMAYRFNG IGVTQNVLYE 900 

NQKQIANQFN KAISQIQESL TTTSTALGKL QDWNQNAQA LNTLVKQLSS NFGAISSVLN 960 

DILSRLDKVE AEVQIDRLIT GRLQSLQTYV TQQLIRAAEI RASANLAATK MSECVLGQSK 102 0 

65 RVDFCGKGYH LMSFPQAAPH GWFLHVTYV PSQERNFTTA PAI CHEGKAY FPREGVFVFN 1080 

GTSWFITQRN FFSPQIITTD NTFVSGNCDV VIGIINNTVY DPLQPELDSF KEELDKYFKN 1140 

HTSPDVDLGD ISGINASWN IQKEIDRLNE VAKNLNESLI DLQELGKYEQ YIKWPWYVWL 1200 
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GFIAGLIAIV MVTILLCCMT SCCSCLKGAC SCGSCCKFDE DDSEPVLKGV KLHYT 1255 

<212> Type : PRT 
<211> Length : 1255 
5 SequenceName : SEQ ID 270 

SequenceDescription : 

Sequence 

10 <213> OrganismName : SARS corpnavirus 
<400> PreSequenceString : 

MFIFLLFLTL TSGSDLDRCT TFDDVQAPNY TQHTS SMRGV YYPDEIFRSD TLYLTQDLFL 60 

PFYSNVTGFH TINHTFGNPV IPFKDGIYFA ATEKSNWRG WVFGSTMNNK SQSVIIINWS 120 

TNWIRACNF ELCDNPFFAV SKPMGTQTHT MIFDNAFNCT FEYISDAFSL DVSEKSGNFK V 18.0 

15 HLREFVFKNK DGFLYVYKGY QPIDWRDLP SGFNTLKPIF KLPLGINITN FRAILTAFSP 240 

AQDIWGTSAA AYFVGYLKPT TFMLKYDENG TITDAVDCSQ NPLAELKCSV KSFEIDKGIY 3 00 

QTSNFRWPS GDWRFPNIT NLCPFGEVFN ATKFPSVYAW ERKKI SNCVA DYSVLYNSTF 360 

FSTFKCYGVS ATKLNDLCFS NVYADSFWK GDDVRQIAPG QTGVIADYNY KLPDDFMGCV 42 0 

LAWNTRNIDA TSTGNYNYKY RYLRHGKLRP FERDI SNVPF SPDGKPCTPP ALNCYWPLND 48 0 

20 YGFYTTTGIG YQPYRVWLS FELLNAPATV CGPKLSTDLI KNQCVNFNFN GLTGTGVLTP 540 

SSKRFQPFQQ FGRDVSDFTD SVRDPKTSEI LDISPCAFGG VSVITPGTNA SSEVAVLYQD 600 

VNCTDVSTAI HADQLTPAWR IYSTGNNVFQ TQAGCL IGAE HVDTSYECDI PIGAGICASY 660 

HTVSLLRSTS QKS I VAYTMS LGADSSIAYS NNTIAIPTNF SISITTEVMP VSMAKTSVDC 720 

NMYI CGDSTE CANLLLQYGS FCTQLNRALS GIAAEQDRNT REVFAQVKQM YKTPTLKYFG 780 

25 GFNFSQILPD PLKPTKRSFI EDLL FNKVTL ADAGFMKQYG ECLGDINARD LICAQKFNGL 84 0 

TVLPPLLTDD M I AAYTAAL V SGTATAGWTF GAGAALQIPF AMQMAYRFNG IGVTQNVLYE .90 0 

NQKQIANQFN KAISQIQESL TTTS TALGKL QDWNQNAQA LNTLVKQLSS NFGAISSVLN 960 

DILSRLDKVE AEVQIDRLIT GRLQSLQTYV TQQLIRAAEI RASANLAATK MSECVLGQSK 1020 

RVDF CGKGYH LMSFPQAAPH GWFLHVTYV PSQERNFTTA PAICHEGKAY FPREGVFVFN 1080 

30 GTSWFITQRN FFSPQIITTD NTFVS GNCDV VIGIINNTVY DPLQPELDSF KEELDKYFKN 1140 

HTS PDVDLGD ISGINASWN IQKEIDRLNE VAKNLNESI* I DLQELGKYEQ YIKWPWYVWL 1200 

GFIAGLIAIV MVTILLCCMT SCCSCLKGAC SCGSCCKFDE DDSEPVLKGV KLHYT 1255 

<212> Type r PRT 
35 <211> Length, r 1255 

SequenceName : SEQ ID 271 
SequenceDescription : 



40 



Sequence 



<213> OrganismMame : SARS coronavirus Tor2 
<400> PreSequenceString : 

MFIFLLFLTL TSGSDLDRCT TFDDVQAPNY TQHTS SMRGV YYPDEIFRSD TLYLTQDLFL 60 

PFYSNVTGFH TINHTFGNPV IPFKDGIYFA ATEKSNWRG WVFGS TMNNK SQSVIIINNS 120 

45 TNWIRACNF ELCDNPFFAV SKPMGTQTHT MIFDNAFNCT FEYISDAFSL DVSEKSGNFK 180 

HLREFVFKNK DGFLYVYKGY QPIDWRDLP SGFNTLKPIF KLPLGINITN FRAILTAFSP 240 

AQDIWGTSAA AYFVGYLKPT TFMLKYDENG TITDAVDCSQ NPLAELKCSV KSFEIDKGIY 30 0 

QTSNFRWPS GDWRFPNIT NLCP FGEVFN ATKFPSVYAW ERKKI SNCVA DYSVLYNSTF 360 

FSTFKCYGVS ATKLNDLCFS NVYADSFWK GDDVRQIAPG QTGVIADYNY KLPDDFMGCV 42 0 

50 LAWNTRNIDA TSTGNYNYKY RYLRHGKLRP FERDI SNVPF SPDGKPCTPP ALNCYWPLND 48 0 

YGFYTTTGIG YQPYRVWLS FELLNAPATV CGPKLSTDLI KNQCVNFNFN GLTGTGVLTP 540 

SSKRFQPFQQ FGRDVSDFTD SVRDPKTSEI LDISPCAFGG VSVITPGTNA SSEVAVLYQD 600 

VNCTDVSTAI HADQLTPAWR IYSTGNNVFQ TQAGCL IGAE HVDTSYECDI PIGAGICASY 660 

HTVSLLRSTS QKS I VAYTMS LGADSSIAYS NNTIAIPTNF SISITTEVMP VSMAKTSVDC 72 0 

55 NMYI CGDSTE CANLLLQYGS FCTQLNRALS GIAAEQDRNT REVFAQVKQM YKTPTLKYFG 780 
GFNFSQILPD PLKPTKRSFI EDLLFNKVTL ADAGFMKQYG ECLGDINARD L I CAQKFNGL ' 84 0 

TVLPPLLTDD M I AAYTAAL V SGTATAGWTF GAGAALQIPF AMQMAYRFNG IGVTQNVLYE 90 0 

NQKQIANQFN KAISQIQESL TTTS TALGKL QDWNQNAQA LNTLVKQLSS NFGAISSVLN 960 

DILSRLDKVE AEVQIDRLIT GRLQSLQTYV TQQLIRAAEI RASANLAATK MSECVLGQSK 1020 

60 RVDF CGKGYH LMSFPQAAPH GWFLHVTYV PSQERNFTTA PAICHEGKAY FPREGVFVFN 108 0 

GTSWFITQRN FFSPQIITTD NTFVS GNCDV VIGIINNTVY DPLQPELDSF KEELDKYFKN 1140 

HTS PDVDLGD ISGINASWN IQKEIDRLNE VAKNLNES L I DLQELGKYEQ YIKWPWYVWL 1200 

GFIAGLIAIV MVTILLCCMT SCCSCLKGAC SCGSCCKFDE DDSEPVLKGV KLHYT 1255 

65 <212> Type : PRT 

<211> Length : 1255 

SequenceName : SEQ ID 272 
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SequenceDescription : 
Sequence 



5 <213> OrganismName : SARS coronavirus GD01 
<400> PreSequenceString : 

MFIFLLFLTL TSGSDLDRCT TFDDVQAPNY TQHTSSMRGV YYPDEIFRSD TLYLTQDLFL 60 

PFYSNVTGFH TINHTFDNPV IPFKDGIYFA ATE KSNWRG WVFGSTMNNK SQSVIIINNS 12 0 

TNWIRACNF ELCDNPFFAV SKPMGTQTHT MIFDNAFNCT FEYISDAFSL DVSEKSGNFK 180 

10 HLREFVFKNK DGFLYVYKGY QPIDWRDLP SGFNTLKPIF KLPLGINITN FRAILTAFLP 240 

AQDTWGTSAA AYFVGYLKPT TFMLKYDENG TITDAVDCSQ NPLAELKCSV KSFEIDKGIY 3 00 

QTSNFRWPS RD WRF PNI T NLCPFGEVFN ATKFPSVYAW ERKRI SNCVA DYSVLYNSTF 3 60 

FSTFKCYGVS ATKLNDLCFS NVYADSFWK GDDVRQIAPG QTGVIADYNY KLPDDFMGCV 420 

LAWNTRNIDA T S TGNYN YKY RYLRHGKLRP FERDISNVPF SPDGKPCTPP ALNCYWPLND 480 

15 YGFYTTTGIG YQPYRWVLS YELLNAPATV CGPKLSTDLI KNQ CVNFNFN GLTGTGVLTP 540 

SSKRFQPFQQ FGRDVSDFTD SVRDPKTSEI LDISPCSFGG VSVITPGTNA SSEVAVLYQD 60 0 

VNCTDVSTAI HADQLTPAWR IYSTGNNVFQ TQAGCLIGAE HVDTSYECDI PIGAGICASY 660 

HTVSLLRSTS QKS I VAYTMS LGADS S I AYS NNTIAIPTNF SISITTEVMP VSMAKTSVDC 720 

NMYICGDSTE CANLLLQYGS FCTQLNRALS GIAAEQDRNT REVFAQVKQM YKTPTLKDFG 780 

20 GFNFSQILPD PLKSTKRSFI EDLLFNKVTL ADAGFMKQYG ECLGDINARD LICAQKFNGL 840 

TVLPPLLTDD MIAAYTAALV SGTATAGWTF GAGAALQ I P F AMQMAYRFNG IGVTQNVLYE 900 

NQKQIANQFN KAISQIQESL TTTSTALGKL QDWNQNAQA LNTLVKQLSS NFGAISSVLN 960 

DILSRLDKVE AEVQIDRLI'T GRLQSLQTYV TQQLIRAAEI RASAMLAATK MSECVLGQSK 1020 

RVDF CGKGYH LMSFPQAAPH GWFLHVTYV PSQERNFTTA PAI CHEGKAY FPREGVFVFN 10 80 

25 GTSWFITQRN FFSPQIITTD NT FVS GNCDV VIGIINNTVY DPLQPELDSF KEELDKYFIOSf 1140 

HTSPDVDLGD ISGINASWN IQKEIDRLNE VAKNLNESLI DLQELGKYEQ YIKWPWYVWL 1200 

GFIAGLIAIV MVTILLCCMT SCCSCLKGAC SCGSCCKFDE DDSEPVLKGV KLHYT 1255 

<212> Type : PRT 
30 <211> Length : 1255 

SequenceName : SEQ ID 273 
SequenceDescription : 

Sequence 

35 

<213> OrganistnFame : SARS coronavirus CUHK-W1 
<400> PreSequenceString : 

MFIFLLFLTL TSGSDLDRCT TFDDVQAPNY TQHTSSMRGV YYPDEIFRSD TLYLTQDLFL 60 

PFYSNVTGFH TINHTFDNPV IPFKDGIYFA ATEKSNWRG WVFGSTMNNK SQSVIIINNS 12 0 

40 TNWIRACNF ELCDNPFFAV SKPMGTQTHT MIFDNAFNCT FEYISDAFSL DVSEKSGNFK 180 

HLREFVFKNK DGFLYVYKGY QPIDWRDLP SGFNTLKPIF KLPLGINITN FRAILTAFSP 24 0 

AQDTWGTSAA AYFVGYLKPT TFMLKYDENG TITDAVDCSQ NPLAELKCSV KSFEIDKGIY 3 00 

QTSNFRWPS GDWRFPNIT NLCPFGEVFN ATKFPSVYAW ERKKI SNCVA DYSVLYNSTF 360 

FSTFKCYGVS ATKLNDLCFS NVYADSFWK GDDVRQIAPG QTGVIADYNY KLPDDFMGCV 420 

45 LAWNTRNIDA TSTGNYNYKY RYLRHGKLRP FERDISNVPF SPDGKPCTPP ALNCYWPLND 48 0 

YGFYTTTGIG YQPYRWVLS FELLNAPATV CGPKLSTDLI KNQ CVNFNFN GLTGTGVLTP 54 0 

SSKRFQPFQQ FGRDVSDFTD SVRDPKTSEI LDISPCSFGG VSVITPGTNA SSEVAVLYQD 600 

VNCTDVSTAI HADQLTPAWR IYSTGNNVFQ TQAGCLIGAE HVDTSYECDI PIGAGICASY 660 

HTVSLLRSTS QKS I VAYTMS L GADS S I AYS NNTIAIPTNF SISITTEVMP VSMAKTSVDC 72 0 

50 NMYICGDSTE CANLLLQYGS FCTQLNRALS GIAAEQDRNT REVFAQVKQM YKTPTLKYFG 780 

GFNFSQILPD PLKPTKRSFI EDLLFNKVTL ADAGFMKQYG ECLGDINARD LICAQKFNGL 840 

TVLPPLLTDD MIAAYTAALV SGTATAGWTF GAGAALQ I PF AMQMAYRFNG IGVTQNVLYE 900 

NQKQIANQFN KAISQIQESL TTTSTALGKL QDWNQNAQA LNTLVKQLSS NFGAISSVLN 960 

DILSRLDKVE AEVQIDRLIT GRLQSLQTYV TQQLIRAAEI RASANLAATK MSECVLGQSK 1020 

55 RVDF CGKGYH LMSFPQAAPH GWFLHVTYV PSQERNFTTA PAI CHEGKAY FPREGVFVFN 1080 

GTSWFITQRN FFSPQIITTD NTFVSGNCDV VIGIINNTVY DPLQPELDSF KEELDKYFKN 1140 

HTSPDVDLGD ISGINASWN IQKEIDRLNE VAKNLNESLI DLQELGKYEQ YIKWPWYVWL 1200 

GFIAGLIAIV MVTILLCCMT SCCSCLKGAC SCGSCCKFDE DDSEPVLKGV KLHYT 1255 

60 <212> Type : PRT 

<211> Length : 1255 

SequenceName : SEQ ID 274 
SequenceDescription : 

65 Sequence 



<213> OrganismName : SARS coronavirus BJ01 
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<400> PreSequenceString : 
MFIFLLFLTL TSGSDLDRCT T FDD VQAPN Y 
PFYSNVTGFH TINHTFDNPV IPFKDGIYFA 
TNWIRACNF ELCDNPFFAV SKPMGTQTHT 
5 HLREFVFKNK DGFLYVYKGY QPIDWRDLP 
AQDTWGTSAA AYFVGYL KPT TFMLKYDENG 
QTSNFRWPS GDWRFPNIT NLCPFGEVFN 
FSTFKCYGVS ATKLNDLCFS NVYADSFWK 
LAWNTRNI DA TSTGNYNYKY RYLRHGKLRP 

10 YGFYTTTGIG YQPYRVWLS FELLNAPATV 
SSKRFQPFQQ FGRDVSDFTD SVRDPKTSEI 
VNCTDVSTAI HADQLTPAWR IYSTGNNVFQ 
HTVSLLRSTS QKSIVAYTMS L GAD S S I AYS 
NMYICGDSTE CANLLLQYGS FCTQLNRALS 

15 GFNFSQILPD PLKPTKRSFI EDLLFNKVTL 
TVLPPLLTDD MIAAYTAALV SGTATAGWTF 
NQKQIANQFN KAISQIQESL TTTSTALGKL 
DILSRLDKVE AEVQIDRLIT GRLQSLQTYV 
RVDF CGKGYH LMSFPQAAPH GWFLHVTYV 

20 GTSWFITQRN FFSPQIITTD NTFVSGNCDV 
HTSPDVDLGD ISGINASWN IQKEIDRLNE 
GFIAGLIAIV MVTILLCCMT SCCSCLKGAC 

<212> Type : PRT 
25 <211> Length : 1255 

SequenceName : SEQ ID 275 
SequenceDescription : 



TQHTS SMRGV YYPDEIFRSD TLYLTQDLFL 60 

ATE KSNWRG WVFGSTMNNK SQSVIIINNS 120 

MIFDNAFNCT FEYISDAFSL DVSEKSGNFK 180 

SGFNTLKPIF KLPLGINITN FRAILTAFSP 240 

TITDAVDCSQ NPLAELKCSV KSFEIDKGIY 3 00 

ATKFPSVYAW ERKKI SNCVA DYSVLYNSTF 3 60 

GDDVRQIAPG QTGVIADYNY KLPDDFMGCV 420 

FERDISNVPF SPDGKPCTPP ALNCYWPLND 480 

CGPKLSTDLI KNQ CVNFNFN GLTGTGVLTP 540 

LDISPCSFGG VSVITPGTNA SSEVAVLYQD 600 

TQAGCLIGAE HVDTSYECDI PIGAGICASY 660 

NNTIAIPTNF SISITTEVMP VSMAKTSVDG 720 

GIAAEQDRNT . REVFAQVKQM YKTPTLKYFG 780 

ADAGFMKQYG ECLGDINARD LICAQKFNGL 840 

GAGAALQ I P F AMQMAYRFNG I GVTQNVL YE 9 00 

QDWNQNAQA LNTLVKQLSS NFGAISSVLN 960 

TQQLIRAAEI RASANLAATK MSECVLGQSK 102 0 

PSQERNFTTA PAI CHEGKAY FPREGVFVFN 108 0 

VIGIINNTVY DPLQPELDSF KEELDKYFKN 1140 

VAKNLNESLI DLQELGKYEQ YIKWPWYVWL 1200 

SCGSCCKFDE DDSEPVLKGV KLHYT 1255 



Sequence 
30 

<213> OrganismName : SARS coronavirus 
<400> PreSequenceString : 

SGFRKMAFPS GKVEGCMVQV TCGT TTLNGL WLDDTVYCPR HVI CTAEDML NPNYEDLLIR 60 
KSNHSFLVQA GNVQLRVIGH SMQNCLLRLK VDTSNPKTPK YKFVRIQPGQ TFSVLACYNG 120 

35 SPSGVYQCAM RPNHTIKGSF LNGSCGSVGF NIDYDCVSFC YMHHMELPTG VHAGTDLEGK 180 
FYGPFVDRQT AQAAGTDTTI TLNVLAWLYA AVINGDRWFL NRFTTTLNDF NLVAMKYNYE 240 
PLTQDHVDIL GPL SAQTGI A VLDMCAALKE LLQNGMNGRT ILGSTILEDE FTPFDWRQC 300 
SGVTFQ 3 06 

<212> Type r PRT 

40 <211> Length : 306 

SequenceName : SEQ ID 276 
SequenceDescription : 



Sequence 
45 

<213> Organ ismName : SARS coronavirus 
<400> PreSequenceString : 

AIASEFSSLP SYAAYATAQE AYEQAVANGD S EWLKKLKK SLNVAKSEFD RDAAMQRKLE 60 

KMADQAMTQM YKQARSEDKR AKVTSAMQTM LFTMLRKLDN DALNNIINNA RDGCVPLNII 120 
50 PLTTAAKLMV WPDYGTYKN TCDGNTFTYA SALWEIQQW DADSKIVQLS EINMDNSPNL 180 

AWPLIVTALR ANSAVKLQ 198 

<212> Type : PRT 

<211> Length : 198 

SequenceName : SEQ ID 277 
55 SequenceDescription : 



Sequence 



<213> OrganismName : SARS coronavirus 
60 <400> PreSequenceString : 

AGNATE V PAN STVLSFCAFA VDPAKAYKDY LASGGQPITN CVKMLCTHTG TGQAITVTPE 60 
ANMDQESFGG ASCCLYCRCH IDHPNPKGFC DLKGKYVQIP TTCANDPVGF TLRNTVCTVC 120 
GMWKGYGCSC DQLREPLMQ 139 
<212> Type : PRT 
65 <211> Length : 139 

SequenceName : SEQ ID 278 
SequenceDescription : 
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Sequence 



<213> Organi smName : SARS coronavirus 
5 <400> PreSequenceStr ing : 

NNELSPVALR QMSCAAGTTQ TACTDDNALA YYNNSKGGRF VLALLSDHQD LKWARFPKSD 60 
GTGTIYTELE PPCRFVTDTP KGPKVKYLYF IKGLNNLNRG MVLGSLAATV RLQ 113 

<212> Type : PRT 
10 <211> Length : 113 

SequenceName : SEQ ID 279 
SequenceDe script ion : 



15 Sequence 



<213> Organi smName : Escherichia coli 0157 :H7 
<40 0> PreSequenceStr ing : 

MNKIFKVIWN PATGNYTVTS ETAKSRGKKS GRSKLLISAL VAGGMLSSFG ALANAGNDNG 60 
QGVDYGSGSA GDGWAIGKG AKANTFMNTS GSSTAVGYDA IAEGQYSSAI GSKTHAIGGA 120 
SMAFGVSAIS EGDRS I ALGA SSYSLGQYSM ALGRYSKALG KLSIAMGDSS KAEGANAIAL 18 0 

GNATKATEIM SIALGDTANA SKAYSMALGA SSVASEENAI AIGAETEAAE NATAIGNNAK 240 
AKGTNSMAMG FGSLADKVNT I ALGNGS QAL ADNAIAIGQG NKADGVDAIA LGNGSQSRGL 3 00 

NTIALGTASN ATGDKSLALG SNSSANGINS VALGADS IAD LDNTVSVGNS SLKRKIVNVK 3 60 

NGAIKSDSYD AINGSQLYAI SDSVAKRLGG GAAVDVDDGT VTAPTYNLKN GSKNNVGAAL 420 
AVLDENTLQW DQTKGKYSAA HGTSSPTASV ITDVADGTIS AS S KDAVNGS QLKATNDDVE 48 0 

ANTANIATNT SNIATNTANI ATNTTNITNL TDSVGDLQAD ALLWNE TKKA FSAAHGQDTT 540 
SKITNVKDAD LTADSTDAVN GSQLKTTNDA VATNTTNIAN NTSNIATNTT NISNLTETVT 600 
NLGEDALKWD KDNGVFTAAH GTETTSKITN VKDGDLTTGS TDAVNGSQLK TTNDAVATNT 660 
TNIATNTTNI SNLTETVTNL GEDALKWDKD NGVFTAAHGN NTASKITNIL DGTVTATSSD 720 
AINGSQLYDL SSNIATYFGG NASVNTDGVF TGPTYKIGET NYYNVGDALA AINSSFSTSL 7 80 

GDALLWDATA GKFSAKHGTN GDASVITDVA DGEISDSSSD AVNGS QLHGV SS YWDALGG 840 
GAEVNADGTI TAPTYTIANA DYDNVGDALN AIDTTLDDAL LWDADAGENG AFSAAHGKDK 900 
TASVITNVAN GAISAASSDA INGSQLYTTN KYIADALGGD AEVNADGT I T APTYTIANAE 960 

YNNVGDALDA LDDNALLWDE TANGGAGAYN ASHDGKASII TNVANGS I S E DSTDAVNGSQ 1020 

LNATNMMIEQ NTQIINQLAG NTDATYIQEN GAGINYVRTN DDGLAFNDAS AQGVGATAIG 10 80 

YNSVAKGDSS VAIGQGSYSD VDTGIALGSS SVSSRVIAKG SRDTSITENG WIGYDTTDG 1140 

ELLGALSIGD DGKYRQIINV ADGSEAHDAV TVRQLQNAIG AVATTPTKYF HANS TEED SL 1200 

AVGTDSLAMG AKTIVNGDKG IGIGYGAYVD ANALNGIAIG SNAQVIHVNS IAIGNGSTTT 1260 

RGAQTNYTAY NMDAPQNSVG EFSVGSADGQ RQITNVAAGS ADTDAVNVGQ LKVTDAQVSQ 13 20 

NTQSITNLDN RVTNLDSRVT NIENGIGDIV TTGS TKYFKT NTDGVDASAQ GKDSVAIGSG 13 8 0 

S I AAADNSVA LGTGSVATEE NTISVGSSTN QRRI TNVAAG KNATDAVNVA QLKSSEAGGV 1440 

RYDTKADGSI DYSNITLGGG NGGTTRISNV SAGWNNDW NYAQLKQSVQ ETKQYTDQRM 1500 

VEMDNKLSKT ESKLSGGIAS AMAMTGLPQA YTPGASMASI GGGTYNGESA VALGVSMVSA 1560 

NGRWVYKLQG STNSQGEYSA ALGAGIQW 15 88 
<212> Type : PRT 
<211> Length : 1588 

SequenceName : SEQ ID 28 0 
SequenceDescription : 

Sequence 

<213> OrganismName : Escherichia coli 0157 :H7 
<40 0> PreSequenceString : 
55 MPASAVGALG EASYTVTANV TDSAGNSNSA SHNVQVNTAL PGVTINPVAT DDI INAAESG 60 
NAQTISGQVT GAAAGDTVTV TLGGKTYTAT VQGNLSWSVD VPAADIQAIG NGNLTVNASV 12 0 

TNGVGNTGSG SRDITIDANL PGLRVDTVAG DDWNSIEHA QALVITGSSS GLAAGAALTV 180 
V INTVT YAAT VLADGTWSVG VPAADVSNWP AGTVNITVSG TNTAGTTS T I THPVTVDLAA 240 
VAISINTVSG DDVINAAEKG ADLTLSGSTS GVEVGQTVTV TFGGKTYTAT VAGDGSWTTT 3 00 

60 VPAADLSVLR DGDATVQASV S T INGNTAS A THAYSVDATA PTLAINTIAT DD I LNAAE AG 360 
NPLTISGSST AEAGQTVTVT LNGVTYSGSV QADGSWSVSL PTADLSNLTA SQYTVSASVS 420 
DKAGNPASAN HGLAVDL TVP VLTINTVSGD DIINAAEHGQ ALVISGSSTG GEAGDVITVT 48 0 

LNSKTYTTML DASGNWSVGV PAADVTALGS GPQTITAAIT DAAGNSDDAS RTVTVNLAAP 540 
TIGINTIATD DVIKATEKGA DLQITGTSNQ PAGTTITVTL NGQNYTATTD SNGNWSATVP 60 0 

65 ASAVSALGEA NYTVTANVTD TAGNSNSASH NVLVNSALPA VTINAVATDD I INAAESGNA 660 
QTISGQVTGA AQGDTVTVTL GGNTYTATVQ SNLSWSVDVP AAD I QALGNG DLTVNASVTN 720 
GVGNTGSGSR DITIDANLPG LRVDTVAGDD VINSIEHNQA LVITGSSSGL TAGTALTVEI 78 0 



20 
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NNVTYGATVL ADGTWSLGVP AVDVSNWPAG 
ITINTLSGDD VINAVEKGET LWSGSTSGV 
PADLAALPDG AGNVQASVSN INGNSAQADR 
ITISGTTTAQ AGQTLTVTLN NNTYQTTVLA 
5 AGNPASADHA LWDITAPDL T INTVAGDD I 
GKNYTTTLDA SGNWSVGIPA ADVTALATGS 
TINTVSGDDI INAAEIWAQ TISGQVTGTA 

ANVLQALGNG eltisasltn sanntgtath 

LVITGSSSGL AAGAALTWI NSVTYGATVL 

10 TAGTTTSISH PVTVDLAAVA ITINTLSTDD 
GGKS YTTTVA ADNTWGLTIP AVDVATLPDG 
VTINTIATDD I LMAAEAGS A LTISGTSTAE 
GDLASLTASS YTVNASVSDK ARNSASATHN 
IISGSATGAT TGNTVSVTIG TTTYTTVLDA 

15 AGNSGTASHT VTVALGAPVL AINTIAVDDI 
QNYTTTADAS GNWSVTVPAS RVSALGEATY 
INWATDDII NAAEAGVEQT ISGQVTGAAA 
ALQELGNGEL TISASVTNSV GNTGNGTREI 
ITGSSSGLAA GSNVTLTING QTYVAAVLAD 

20 GNPVSVTHPV TVDLSAVAVS INAITADDVI 
KTYSATVAAN GSWSTSVPAA DMAALRDGDA 
INTIAGDDIL NAAEAGAALT ITGSSTAEAG 
LSTLTASNYT VNAAVS DKAG NPASVNHNLT 
SGSATGAATG STVTVTIGTN TFTTVLDASG 

25 NSGSATHQVT VNTGLPTITF NAISGDNILN 
NYSATTDASG NWTLTVPVSD LAALGQANYT 
NTVAGDD I IN AAEAGADQTI SGWTRAAAG 
LQAL GNGDLT I TAS VTNANG NTGSGTRDIT 
TGGS SGLNAG AVLTVTINSV AYSATVQADG 

30 MPVSVSHPFT VDLTAVAISI NTVASDDVIN 
TYTASVAANG SWSVNVPAAD LATIiPEGAAN 
NTIASDDILN AAEAGSPLTI SGTSTAETGQ 
GALNASNYTV SATVNDKAGW PGSASHNLAV 
GTS S GGEAGD WSWLNGKT YTTTLDASGN 

35 SDDASRTVTV SLSAPVISIN TIAGDDVINA 
S ATTDAS GNW SVTVPASAVS ALGEATYSVT 
VATDD I INAS EAGSAQTISG QVTGAAAGST 
ALGNGELTVN ASVTNAVGNT GSGTRDITID 
SSSGFAAGTA LTWINNQTY AATVLANGSW 

40 TS ITHPLTVD LTAVAISMNS ITSDDAINAA 
TTTVAANGSW STTVPAADLA ALRDGDASAQ 
IASDNIINAS EAAAGVTVSG TSTAQTGQTL 
LANNGYTLTA TVS DLAGMLG S AS KGVTVDT 
ATGAVAGDRL WTIAGQQYV TSTDASGNWS 

45 TQTHNVQVNT AAVSLSVSTI SGDNLINAAE 
ATIQSNGSWS VNVPAADVAA LSDGTSYTVS 
STISGDNLIN AAEAGSALTL SGTGTNFATG 
VAALSDGTSY TVSASAQDSA GNSATASRSV 
LNGSTSAEVG QTVTVTFGGK TYTATVAANG 

50 NPGQATHALT VDTVAPTVTI ATVAGDDIIN 
WSATVGSGGS WSVFIPAQQF AGLSDGSYTI 
TFAGDDWNA AEHGSSLVIS GTTTAPVGQT 
ALADGNAYVI NASVSNAIGN TGSSNHTITV 
PVWNGSLTA ALASNETAQI SIDGGTTWTT 

55 AGNVGATDSQ NWIDTTAPD PAVKTIAISA 
AGEFAQISLD GGVTWTTLTV VGTSWSYADG 
VDTTSPEAAK SITITGISDD TGTSSSDFIT 
WVNVTVAADS LNWSYVDGRT LTNGTTTWQV 
ASISTDTGSS ATDFITSDTM LTLTGSLGAG 

60 DSRTLTDGSY VYQVRVLDLA GNTGPWSKT 
SQATDDTTPL LNGVLSAPLA SGEWYLYRN 
ARWDLAGNI TSSSDFVLTV DTSIPTTLAQ 
INGKTYTSEP GGAVWD PAH NTWYVQLPDT 
GTVTVNAAID YTPTWTTASK TTAWGLTYGL 

65 YQSGNNYATS S IAD YDRNGT GDLFITRDDY 
GS I VAFDKEG DGYLDFWIGD AGGPDSNTFL 
SLNEGSGVDL NNDGRIDLVQ HTYNLNNYYT 



TVNITVSGTN SAGTTSTITH PVTVDLAGVA 840 

EAGQTVTVTF GGKNYTTTVE ANGSWTVNVP 900 

AYSVDATAPL VTINTIASDD I LNVS EAGAG 960 

DGTWSVNVPA ADLSGLTASS YTVTATVSDK 1020 

INAIEHGQAL WS GTS TGAA AGDWTVTLN 1080 

QTITASLSDR AGNSDSTTHD VTVDLSGPTL 1140 

VAGNTVIVTI GGNQYNATVQ SDLSWSVSVP 12 00 

DIVIDANLPG LRVDTVAGDD VINSIEHTQA 1260 

ADGSWSVGVP VADVTNWPAG TVNIAVSGTN 132 0 

VINAAEKGSD LQLSGTTSGV EAGQTITVIF 13 80 

AANVQASVSN VAGNSTQATH AYS VDATAP S 1440 

AGQTVTVTLN GVNYSGNVQA DGSWSVSVPT 15 00 

LTVDIiAAPW T INTVAGDD I INATEHGQAQ 15 60 

NGNWSIGVPA SVISALAQGD VT I TATVTD S 1620 

I NAAE KGADL AITGTSNQPA GTQITVTLNG 1680 

TVTAAATDAD GNSGSASHNV QVNTALPGVT 1740 

GDTVTVTLGG ATYTATVQAN LSWSVDVPAS 1800 

TIDANLPGLR VDTVAGDDW NIIEHGQALV 18 60 

GTWSVGVPAV DVSAWPAGSV TIAASGSTSA 1920 

NAAEKGAALT LSGSTSGVEA GQTVTVTFGG 1980 

SAQASVSNVN GNSATTTHAY SVDASAPTVT 2040 

QTVTVTIiNGT NYTGTVQTDG SWSVSVPSAD 2100 

VDTSVPWTI NTVAGDDVIN ATEHAQAQII 2160 

NWSVGVPASV VSAIiANGTVT INASVTDAGG 2220 

ADEKGQPLTI SGGSTGLATG AQVTVTLNGH 22 80 

VSASATSAAG NTASSQANLL VDSGLPDVTI 2340 

DTVTVTLGGN TYTATVQSNL SWSVSVPTAD 24 00 

IDANLPGLRV DTVAGDDIVN SIEHGQALVI 2460 

SWSVGIPAAN VSAWPAGPLT VEVDGQS SAN 2520 

AAEKGTNLTL SGSTSGIESG QTVTVTFGGK 2580 

VQASVSSASG NSASATHAYS VDASAPTLTI 2640 

TVTVTLNGAT YTGTVQADGS WSVSVPTSAL 2700 

DTTAPVLTIN TVAGDD I IND AEHAQALVIS 2760 

WSVGVPAADV TALGSGAQTI TASVSDRAGN 2820 

TEKGS DLAIiS GTSDQ PAGTA ITVTLNGQNY 2880 

ASVTNAQGNS STASHNVQVN TALPGITINP 2940 

VTVELGGKTY TATVQADLSW NVSVPAADWQ 3 000 
ASLPGIiRVDT VAGDDWN 1 1 EHAQAQVITG * 3 060 

SVGVPATDVS NWPAGTLNIT VSGANSAGTQ 3120 

EKGAAIiTLSG STSGVEAGQT VTVTFGGKTY 3180 

VRVTNVNGNS ATATHEYSVD SAAPTVTINT 3240~ 

TVTLNGTNYQ TTVQTDGSWS LTLPASDLTA 33 00 

TAPVISFNTV AGDDVINNVE HIQAQIISGT 33 60 

VGVPASVISG LADGTVTISA TITDSAGNSS 3420 

AGSALTLSGT GTNFATGTW TVLLNGKGYS 3480 

ASAQDSAGNG NSSTQTHNVQ VNTAAVSLSV 354 0 

TWTVLLNGK GYSATIQSNG SWSVNVPAAD 360 0 

AVDLTAPVIS INTVS TDDRL NAAEQQQPLT 3660 

TWALNVPAVD LAALGQGAQT I TAS VNDRAG 3720 

NAEQLAGQTI SGTTTAEVGQ TVTVTFNGQT 3780 

SATVSDQAGN PGSASRGVTL NGDVPTVTIN 3840 

LTLTLNGKTY TTTVQTGGSW SYTLGSADVT 3900 

DLSAPAMGIN IDSLQADTGL SASDFITSVS 3960 

LTVTGTTWRY NDSRTLTDGN YLYQVRVIDA 402 0 

I TTDMGL I TN DFVTSDTTLA VSGTLGATLS 4080 

HTLTDGTWNY TVRWDLAGN VGQTATQNW 4140 

SDTTLTVRGV LGAALGANEF AQISTDNGAT 420 0 

RWDLAGNVG ATSSQSALID TVNPAQVLTI 4260 

LASGEVAQIS LDSGATWTTL TTNGTQWTYT 432 0 

VWDTINPTA TPTIVSYTDD VGQRQGTLSS 43 80 

GLLLGAVTMV GALNWTYSDS GLVSGAYTYS 4440 

ITSQTTRDTT PIISGVITAA LASGQYVEW 4500 

DAL TVS ATA Y TVTAQVKSSA GNGNNANISN 4560 

DSHGMWTVLA NQQVMQSTDP LTWSKTALTL 4620 

GTGYINGFTN NGDGTFSSAI QVTVGTLTWY 4680 

WNNAGTLVGN STTSNSGGSA TVGGAVTGYL 4740 

LSSLINQGNG TFVWGQNTTN TFLSGAGSGA 48 00 
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MSSSVSMTWA DFDGDGDMDL FLPASQGRAN 
LAVDWNHDGL MDIARIAQTG QSYLYTNVSN 
VDVLVSKQSG SVFIiSRNTNT VSYGTSLHLR 
NPQSGMGVND TSALVNFYGL NAGETYNAVL 
5 YDLSAEAGTA SNNGKFVGTG YNDTFFATAG 
RL S TVGVTAN LSSTAAQATG FNTSTFTNIE 
IGNGGHDTLL YKLLNASDAT GGNGSDWNG 
KASYVNGVAT LDAQAGNIGD FVKVTQSGSD 
TLLANHQLMV V 
10 <212> Type : PRT 

<211> Length : 5291 

SequenceName : SEQ ID 281 
SequenceDescription : 

15 Sequence 



<213> OrganismMame : Escherichia 
<4 0 0 > PreSequenceString : 
MGVHTAEATL PNGNNDTKIV NIAPDASNAQ 

20 P VAGI TVNFT MPQDVAANFT LENNGIAITQ 
SQPVTFVADK TSALWLQIS KNEITGNGVD 
TLTPGESNTN ESGIAQATLA GVAFGEQTVT 
VPDSIIAGTP QNSSGSVITA TWDNNGFPV 
TVTYTNTRSS I ES GARPDTV EASLENGSST 

25 TTNLYIEVKD NYGNGVPQQE VTIiSVSPSEG 
VTATLENGDS MQQTVTYVPN VANAEISLAA 
EVTFTLPEDV RANFTLGDGG KWTDTEGKA 
IADTLTAQVN LNVTEDNFIA NNVGMTRLQA 
QGGSAITDIN GKAEVTLSGT KSGTYPVTVS 

30 SFWSTTEGA TMTASVTDAN GNPVEGI KVN 
GLKTVSASLA DKPTEVISRL LNAKADINSA 
PILNESVTFS AEPPEHMTIS QNIVSTDTHG 
WIDQKLTLS ASSPLIGVNS PTGATLTATL 
TNSSGQAPW LTSNKVGTYT VTAS FHNGVT 

35 TNSDLSTLKA TVEDGSGNLI EGLTVYFALK 
VTVSAVTTAG GMQTVD I TLV AGPADASQ S V 
KVSEGLEFVQ SGTNAPYVQV SAIDYSKNFS 
TTIQFTRAED KIMS GTVLVN GANLPTTTFP 
ASWVDVDATG KVTFKNVGSK WERITATPKT 

40 FCSSNGYTLP LGDHLNHSRS RGIGSLYSEW 
VSLATGDQSV FEKLGFAYAT CYKNL 
<212> Type : PRT 
<211> Length : 1345 

SequenceName : SEQ ID 282 

45 SequenceDescription : 



YGSLLFNTNG VLGCPVAVGA TATTYASQFS 48 60 

ASNWTQSALG GSQSGTTSGV AAMDYDWDGA 492 0 

ITDPNGINVY YGNTVKLYNS AGVLVATQII 498 0 

IKSTGTTASN IDQTVNTSWG GLQATDATHA 5040 

TDTYDGSGGW VYS SGTGTWL ANGGMDWDF 5100 

GISGSNFNDI LTGSSGDNQL EGRGGNDTLN 5160 

FTVGTWEGTA DTDRIDIREL LQGSGYTGNG 522 0 

TIVQIDRDGT GGTFATTNW TLTGVHTDLA 528 0 

5291 



COli Q157:H7 



VTLNIPAQQV 


VTNNSDSVQL 


TATVKDPSNH 


60 


ANGEAHVTLK 


GKKAGTHTVT 


ATLSNNNTSD 


120 


SATLTATVKD 


QFDNEVNNLP 


VTFSTASSGL 


18 0 


ASLANNGASD 


NKTVHFIGDT 


AAAKI IELTP 


240 


KGVTVNFTSN 


AATAEMTNGG 


QAVTNEQGKA 


300 


LSTSINVNAD 


AS TAHLTLLQ 


ALFDTVSAGD 


360 


VTPSNNAIYT 


TNHDGNFYAS 


FTATKAGVYQ 


420 


SKDPVIANNN 


DLTTLTATVA 


DTEGNAIANS 


480 


KVTLKGTKAG 


AHTVTASMAG 


GKSEQLiWNF 


540 


TVTDGNGNPL 


ANEAVTFTLP 


ADVSASFTLG 


600 


VNNYGVSDTK 


QVTLIADAGT 


AKLASLTSVY 


660 


FRGTSVTLSS 


TSVETDDRGF 


AEILVTSTEV 


720 


TITSLEIPEG 


QVMVAQDVAV 


KAHVNDQFGUT 


780 


IAEVTMTPER 


NGSYMVKASL 


ANGSSYEKDL 


840 


TSANGTPVEG 


QVINFSVTPE 


GATLiSGGKVR 


900 


IQTQTIVKVT 


GNSSTAHVAS 


FIADPSTIAA 


960 


SGSATLTSLT 


AVTDQNGIAT 


TSVRGAITGS 


1020 


LKNNRSSLKG 


DFTDSAELHL 


VLHDISGNPI 


1080 


GEYKATVTGG 


GEGIATLIPV 


LNGVHQAGLS 


1140 


SQGFTGAYYQ 


LNNDNFAPGK 


TAADYEFSSS 


1200 


GGPSYIYEIR 


VKSWWVNAGD 


AFMIYSLAEN 


1260 


GDMGHYTTEA 


GFHSNMYWSS 


SPANSNEQYV 


1320 








1345 



Sequence 



<213> OrganismName : Escherichia 

50 <40 0> PreSequenceString : 

MSLIIDVISR KTSVKQTLIN PGDVTWIYE 
TVIRCNGYFL QAANTAEQSE LVFADGQQLT 
LDTVAQTSAF P W G W LAG AAV GGGALGALLA 
TDNQGDQRGI LATNDITDDT TPTFSGSGQA 

55 TQSAGEHTWS WQIVGSTIT DAGSITLTID 
TSSHLAQGTE LTVTLNGKTY TTSVGANGAW 
VTGAQLLTVD TQPPTLAINT IAQDNIISAA 
ATVGSDGTWQ VTLPATEVQA LAEGNYAWA 
AGDD I LNNAE QAVAQIISGQ VSGASPGDTV 

60 LDRGANT I FV TVTDAAGNTG AASRAITLVG 
■ TQQAETGQTV TVTLAGQSFT TTVQADGSWS 
NTSRTITVDS QAPALSIDPL TADNIINAAE 
WQPDGTWSV TVPAANVGAL AD GNATVTAS 
TDNVINTPEH AQAQIISGTV TGAQAGD I VT 

65 ADGSYPVSVS VTDKAGNTGS QSLTVTVNTA 
DQPVNTAITV TLNGQNYTTT TDASGNWSVT 
SHNVLVDSAL PGVTINPVAT DDI INAAEAG 



COli 0157 :H7 

PSWQVHAQA SAVARYVREG NDLLIYMQDG 60 

HIT FADTAAG GLAPVELTAQ TTAIESIAPF 120 

SGGDGDSKTE VINNPTPPAE PGNTATPSFLV 18 0 

GATIQIKDSN GNTIASTQVD NNGHWSVSLP 240 

NSQASVQVAT TAGDNI INAS EQAAGFTLSG 3 00 

SVQVPTADAQ ALGEGNQAVL VSGKDATGNT 3 60 

EHNVALVLSG TSNAEAGQTV TLTVNGKSHT 420 

SVSDRAGNTT SHSANFTVDT SAPWSVNTV 4 80 

TVKLGTHVLT GIVLADGSWN VALDPAVTRT 540 

VSPLITINTV SGDDIISGAE KGAPLTLTGS 600 

LTVPAAAMGN LPDGAVAITA SVTDLSGNTG 660 

SGQDLPITGT TDAQPGQTVT VTLNGQTYQG 72 0 

VNDVAGMPSS VSRVALVDAT PPWTINPVA 780 

VTLNNVDYTT WDGSGNWSL GVPASWSGL 840 

APLIGINSIA GDDVINASEK GADLQITGTS 900 

VPASAVTALG QANYTVTAAV TSDIGNSATA 960 

VAQTISGQVT GAEDGDTVTI TLGGNTYTAT 102 0 
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VGSNLTWSVD VPAADIQALG NGDLTVNASV TNQNGNTGSG TRDITIDANL PGLRVDTVAG 108 0 

DDWNI IEHG QALWTG S S S GLAESTPLTV TINNVEYTTA VQADGSWSVG VTAAQVSAWP 1140 

AGTVNIAVSG ESSAGNSVSI THPVTVDLTP AAITINTIAT DDVINAAE KG ADLTLSGTTT 12 00 

NVEPGQTVTV TFGGKNYTAS VASDGSWTAT VPAADLASLP EGSASALASV SNINGNSASA 12 60 

5 VHNYSVDSSA PTIIINTVAS DNIVNASEAD AGVTVSGSTT AEAGQIVTIT LNSPTVQTYQ 13 20 

ATVQADGSWS INI PAADLEA LTDGSHTLTA TVNDKAGNPA STTHNLAVDL TVPVLTINTI 13 80 

AGDDIINATE HGQALVISGS STGGEAGDW TVTLNS KTYT TTLDASGNWS VGVPAADVTA 1440 

LGSGPQTVTA TVTDAAGNSD N 1461 
<212> Type : PRT 
10 <211> Length : 1461 

SequenceName : SEQ ID 283 

SequenceDescription : 



Sequence 
15 

<213> OrganismName : Escherichia coli 0157:H7 
<400> Pre Sequences t ring : 

MNRIYRVIWN CTLQVFQACS ELTRRVGKTS TVNLRKSSGL TTKFSRLTLG VLLALSGSVS 60 

GASLEVDNGQ I TN I DTDVAY DAYLVGWYGT GVLNILAGGN ASLTTITTSV IGGNEDSEGT 12 0 

20 VNVLGGTWRL YDSGNNARPL NVGQSGTGTL NIKQKGHVDG GYLRLGTQAA GVGTVNVEGE 18 0 

DSVLTTELFE IGSYGTGSLN ITDKGYVTSS IVAILGYQAN SNGKWVE KG GEWLIKNNDS 240 

SIEFQIGNQG TGEATIREGG LITAENTIIG GNATGVGTLN VQDQDSVITV RRLYNGYFGN 3 00 

GAVNISNNGL INNKEYSLVG VQDGSHGWN VTDKGHWNFL GTGEAFRYIY IGDAGDGELN 3 60 

VSREGKVDSG I ITAGMKETG TGNL TVKDKN SVITNLGTNL GYDGHGEMNI SNEGLWSNG 42 0 

25 GSSLGYGETG VGKVSITTGG IWEVNKNVYT TIGVAGVGNL NISDGGKFVS QNITFLGDKA 48 0 

SGIGTLNLMD ATSSFDTVGI NVGNFGSGIV NVSNGATLNS TGYGFIGGNA SGKGIVNIST 54 0 

DSLWNLKTSS TNAQLLQVGV LGTGELNITT GGIVKARDTQ IALNDKSKGD VRVDGQNSLL 600 

ETFNMYVGTS GTGTLTLTNS GTLNVEGGEV YLGVFE PAVG TLNIGAAHGE AAADAGF I TN 660 

ATKVEFGSGE GVFVFNHTNN S DAGYQ VDML I TGDDKDGKV IHDAGHTVFN AGNTYSGKTL 72 0 

30 VNDGLLTTAS HTADGVTGMG SSEVTIASPG TLDILASTNS AGDYTLTNAL KGDGLMRVQL 780 

SSSDKMFGFT HATGTEFAGV AQLKDSTFTL ERDNTAALTH AMLQSDIENT TSVUVGEQSI 840 

GGLAMNGGTD IFDTDIPAAT LAEGYISVDT LWGASDYTW KGRNYQVNGT GDVLIGVPKP 900 

WNDPMANNPL TTLNLLEHDD NHVGVQLVKA QTVIGSGGSL TLRDLQGDEV EADKTLHIAQ 960 

NGTWAEGDY GFRLTTAPGD GLYVNYGLKA LNIHGGQKLT LAEHGGAYGA TADMSAKIGG 1020 

35 EGDLAINTVR QVSLSNGQND YQGATYVQMG TLRTDADGAL GNTRELNISN AAIVDLNGST 1080 

QTVETFTGQM GSTVLFKEGS LTVNKGGISQ GELTGGGNLN VTGGTLAVEG LNARYNALTS 1140 

VSPNAEVSLD NTQGLGRGNI ANDGLLTLKN VTGELRNSIS GKGIVSATAR TDVELDGDNS 12 00 

RFVGQFNIDT GSALSVNEQK NLGDASVINN GLLTISTERS WAMTHSISGS GDLTKLGTGI 1260 

LTLNNDS SAY QGTTDIVGGE IAFGSDSAIN TASQHINIHN SGVMSGNVTT AGDVNVMSGG 1320 

40 TLRVAKTTXG ESAATWRMAA RFK 1343 
<212> Type : PRT 
<211> Length : 1343 

SequenceName : SEQ ID 284 
SequenceDescription : 

45 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

50 MG I KQHNGNT KADRLAELKI RSPSIQLIKF GAIGLNAlLF SPLLIAADTG SQYGTNITIN 60 
DGDR I TGDTA DPS GNL YGVM TPAGNTPGNI NLGNDVTVNV NDASGYAKGI IIQGKNSSLT 120 
ANRLTVDWG QTSAIGINLI GDYTHADLGT GSTIKSNDDG IIIGHSSTLT ATQFTIENSN 180 . 

GIGLTINDYG TSVDLGSGSK I KTDGS TGVY IGGLNGNNAN GAARFTATDL TIDVQGYSAM 240 
GINVQKNSW DLGTNSSIKT SGDNAHGLWS FGQVSANALT VDVTGAAANG VEVRGGTTTI 3 00 

55 GADSHISSAQ GGGLVTSGSD ATINFSGTAA QRNSIFSGGS YGASAQTATA VINMQNTDIT 3 60 

VDRNGSLALG LWALSGGRIT GDSLAITGAA GARGI YAMTN SQIDLTSDLV IDMSTPDQMA 420 
IATQHDDGYA ASRINASGRM LINGSVLSKG GLINLDMHPG SVWTGSSLSD NVNGGKLDVA 48 0 

MNNSVWNVTS NSNLDTLALS HSTVDFASHG STAGTFTTLN VENLSGNSTF IMRADWGEG 540 
NGVKPWA 547 

60 <212> Type : PRT 

<211> Length : 547 

SequenceName : SEQ ID 2 85 
SequenceDescription : 

65 Sequence 



<213> OrganismName : Escherichia coli Q157:H7 
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<4 00> PreSequenceString : 

MGIDSRNDIP EGIATLGAFM GYSHSHIGFD RGGHGSVDSY SLGGYASWEH ESGFYLDGW 60 

KLNRFESNVA GKMSSGGAAN GSYHSNGLGG HIETGMRFTD GNWNLTPYAS LTGFTADNPE 12 0 

YHLSNGMESK SVDTRSIYRE LGATLSYNMR LGNGMEVEPW LKAAVRKEFV DDNRVKVNSD 180 

5 GNFVNDLSGR RGIYQAGIKA SFSSTLSGHL GVGYSNGAGM ESPWNAVAGV NWSF 234 



<212> Type : PRT 
<211> Length : 234 

SequenceName : SEQ ID 2 86 
10 SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
15 <400> PreSequenceString : 

MKKKVLAIAL VTVFTGMGVA QAADVTAQAV AT W S ATAKKD TTSKLWTPL GSLAFQYAEG 
IKGFNSQKGL FDVAIEGDST ATAFKLTSRL ITNTLTQLDT SGSTLNVGVD YNGAAVEKTG 
DTVMIDTANG VLGGNLSPLA NGYNASNRTT AQDGFTFSII SGTTNGTTAV TDYSTLPEGI 
WSGDVSVQFD ATWTS 
20 <212> Type : PRT 

<211> Length : 195 

SequenceName : SEQ ID 287 

SequenceDescription : 

25 Sequence 



<213> OrganistnName : Escherichia coli 0157:H7 

<400> PreSequenceString : 

MTAESYDDNY LDDEDADWTA TGQGQKSAGD TSFTLAWKPG EEGQKGLIGW FESGDVRAYK 60 
30 IRFPNGTVDV FRGWVSSIGK AVTAKEVITR TVKVTNVGKP SVAEERSKIT PVSAIKVTPT 120 

SGTVAKGKTT TLTVS FEPES ATDKTFRAVS ADPSKATISV KDMTITVNGV ATGKVQIPW 180 

SGNGQFAAVA EVTVTEAGAA G 2 01 

<212> Type t PRT 

<211> Length : 201 
35 SequenceName : SEQ ID 288 

SequenceDescription : 

Sequence 



40 <213> OrganismName i Escherichia coli 0157:H7 
<400> PreSequenceString : 

MTAESYDDNY LDDEDADWTA TGQGQKSAGD TSFTLAWKPG EEGQKGLIGW FESGDVRAYK 60 

IRFPNGTVDV FRGWVSSIGK AVTAKEVITR TVKVTNVGKP SVAEERSKIT PVSAIKVTPT 120 

SGTVAKGKTT TLTVS FEPES ATDKTFRAVS ADPSKATISV KDMTITVNGV ATGKVQIPW 180 

45 SGNGQFAAVA EVTVTEAGAA G 2 01 



<212> Type : PRT 

<211> Length : 201 

SequenceName , : SEQ ID 289 
SequenceDescription : 

50 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

55 MLYNIPCRIY ILSTLSLCIS GIVSTATATS SETKISNEET LWTTNRSAS NLWESPATIQ 60 

VIDQQTLQNS TNAS I ADNLQ DIPGVEITDN SLAGRKQIRI RGEASSRVLI LIDGQEVTYQ 120 

RAGDNYGVGL LIDESALERV EWKGPYSVL YGSQAIGGIV NF I TKKGGDK LASGWKAVY 180 

NSATAGWEES IAVQGSIGGF DYRINGSYSD QGNRDTPDGR LPNTNYRNNS QGVWLGYNSG 240 

NHRFGLSLDR YRLATQTYYE DPDGSYEAFS VKIPKLEREK VGVFYDTDVD GDYLKKIHFD 3 00 

60 AYEQTIQRQF ANEVKTTQPV PSPMIQALTV HNKTDTHDKQ YTQAVTLQSH FSLPANNELV 3 60 

TGAQYKQDRV SQRSGGMTSS KSLTGFINKE TRTRSYYESE Q S TVS L FAQN DWQFADHWTW 42 0 

TMGVRQYWLS SKLTRGDGVS YTAGIISDTS L ARE S ASDHE MVTSTSLRYS GFDNLELRAA 4 80 

FAQGYVFPTL SQLFMQTSAG GSVTYGNPDL KAEHS NNFEL GARYNGNQWL IDSAVYYSEA 540 

KDYIASLICD GS I VCNGNTN SSRSSYYYYD NIDRAKTWGL E X S AEYNGWV FSPYISGNLI 600 

65 RRQYETSTLK TTNTGEPAIN GRIGLKHTLV MGQANIISDV F X RAAS S AKD DSNGTETNVP 660 

GWATLNFAVN TEFGNEDQYR INLALNNLTD KRYRTAHET I PAAGFNAAIG FVWNF 715 



60 
120 
180 
195 
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<212> Type : PRT 

<211> Length : 715 

SequenceName : SEQ ID 290 
SequenceDescription : 



Sequence 

<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

MTKMSRYALI TALAMFLAGC VGQREPAPVE EVKPAPEQPA EPQQPVPTVP SVPTIPQQPG 60 
PIEHEDQTAP PAPHIRHYDW NGAMQPMVSK MLGADGVTAG SVLLVDSVNN RTNGS LNAAE 12 0 

ATETLRNALA NNGKFTLVSA QQLSMAKQQL GLSPQDSLGT RSKAIG-IARN VGAHYVLYSS 180 
ASGNVNAPTL QMQLMLVQTG EIIWSGKGAV SQQ 213 
<212> Type : PRT 
<211> Length : 213 

SequenceName : SEQ ID 291 

SequenceDescription : 



Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<40 0> PreSequenceString : 

MKSKVLALLI PALLGAGAAH AAEVYNKDGN KLDLYGKVDG LHYFSDNSAK DGDQSYARLG 60 
FKGETQINDQ LTGYGQWEYN IQANNTESSK NQSWTRLAFA GLKFSDYGSF DYGRNYGLDR 120 
YAA 123 
<212> Type : PRT 
<211> Length : 123 

SequenceName : SEQ ID 292 

SequenceDescription : 



Sequence 



<213> OrganismName t Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MATPNPLEPV KGAGTTLWVY NGKGDAYANP LSDDDWQRLA KVKDLTPGEM TAEPYDDNYL 60 
DDEDADWTAT GQGQKSAGDT SFTLAWKPGE EGQKGLI GWF ESGDVRAYKI RFPNGTVDVF 120 
RGWVSSIGKA VTAKEVITRT VKVTNVGKPS VAEERSEITP ATAI KVTP TS GTVAKGKTTT 180 
LTVSFEPESA TDKTFRAVSA DPSKATISVK DMTITVNGVA TGKVQIPWS GNGQFAAVAE 24 0 

VTVTEAGAAG 2 50 

<212> Type : PRT 
<211> Length : 250 

SequenceName : SEQ ID 293 

SequenceDescription : 



Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<40 0> PreSequenceString : 

MATPNPLEPV KGAGTTLWVY NGKGDAYANP LSDDDWQRLA KVKDLXPGEM TAEPYDDNYL 60 
DDEDADWTAT GQGQKSAGDT SFTLAWKPGE EGQKGLIGWF ESGDVRAYKI RFPNGTVDVF 120 
RGWVSSIGKA VTAKEVITRT VKVTNVGKPS VAEERSEITP ATAI KVTP TS GTVAKGKTTT 18 0 

LTVSFEPESA TDKTFRAVSA DPSKATISVK DMTITVNGVA TGKVQIPWS GNGQFAAVAE 240 
VTVTEAGAAG 2 50 

<212> Type : PRT 
<211> Length : 250 

SequenceName : SEQ ID 294 

SequenceDescription : 



Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

MGWTDMLPEF GGDSYTNADN FMTGRANGVA TYRNTDFFGL VNGLNFAVQY QGNNE GAS NG 60 

QEGTNNGRDV RHENGD GWGL STTYDLGMGF SAGAAYTSSD RTNDQVNHTA AGGDKADAWT 120 

AGLKYDANNI YLATMYSETR NMTPFGDSDY AVANKTQNFE VTAQYQFDFG LRPAVSFLMS 180 

KGRDLHAAGG ADNPAGVDDK DLVKYADVGA TYYFNKNMST YVDYKINLLD EDDSFYAANG 240 

ISTDDIVALG LVYQF 255 
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<212> Type : PRT 

<211> Length : 255 

SequenceName : SEQ ID 295 
SequenceDescription : 

Sequence 



<213> OrganismName : Haemophilus influenzae Rd 
<4 00> PreSequenceString : 

MGFIMKLTKT ALCTALFATF TFSANAQTYP DLPVGIKGGT GALIGDTVYV GLGSGGDKFY 60 

TLDLKDPSAQ WKEIATFPGG ERNQPVAAAV DGKLYVFGGL QKNEKGELQL VNDAYRYNPS 12 0 

DNTWMKLPTR SPRGLVGSSG ASHGDKVYIL GGSNIaSIFNG FFQDTVAAGE DKAKKDEIAA 18 0 

AYFDQRPEDY FFTTELLSYE PSTNKWRNEG RIPFSGRAGA AFTIQGNELV WNGEIKPGL 24 0 

RTAETHQGKF TAKGVQWKNL PDIiPAPKGKS QnniAGALSG YSNGKYLVTG GANFPGSIKQ 3 00 

FKEGKLHAHK GLSKAWHNEV YTLNNGKWRI VGELPMNIGY GFSVSYNNKV LLIGGETDGG 36 0 

KALTSVKAIS YDGKKLTIE 379 
<212> Type : PRT 
<211> Length : 3 79 

SequenceName : SEQ ID 296 

SequenceDescription : 

Sequence 



<213> OrganismName : Haemophilus influenzae Rd 
<4 00> PreSequenceString : 

MGEQYMLTTI LSFLIVTTW AYVSWLKTKG DDLKSSKGYF LAGRGLSGLV IGCSMVLTSL 60 
STEQLIGVNA VSYKGNFSVI AWTVPTVIPL CFLALYIIGW L 101 
<212> Type : PRT 
<211> Length : 101 

SequenceName : SEQ ID 297 

SequenceDescription r 

Sequence 



<213> OrganismName : Helicobacter pylori CF99 
<400> PreSequenceString r 

MKNQHKNPLT KALMKTYPYN HFLFFCFILG AFLLGLLSPA YALSIITTKE IDANLLNGAI 60 

ESRWLGKRV FKVEAHGFYF RNNATNSIDI EITSLIiRDNQ SFPLTSSAKT SLKIPPNAKI 120 

KKSTILVLKG ENAEEVAKIL GVSKEEYQKL ENIAQTKAAN DPMYANTPFS NGSDSSFYDN 18 0 

NPNSPSNNAI NGKDGANGSN GYGANGNDGV NGISGSNGAN GSHSNNNAIG SGIDTDGVLG 240 

VDGVNGSSSS SGGSVGGYEN NFTNHGSTNN NTGGYDNFNN GSSSGGSLGN GGLFPIPFGN 300 

GDTNNSNNST NTTSPTNGSS SNNATNPSSQ ENNYSSQYCK VPELSPNNTM KLDVIAKDGS 360 

CIS MNALRDD TKCAYRYDFE AGKAIKQTQY YYVDRENKTQ NIGGCVDLQG AQYAMQLYKD 420 

DSKCALQTTS DKGYGMGKTQ TFQTEIVFRG MDNL I HVAVP CSDYARVQDR IVRYEKNDKT 480 

QTLTPIVDQY YNDPNNPNKQ EILNRGIATQ LSSQYQEFAC GQWEYNDAKL EAKRPTMLKS 540 

YNKLNGEWVE VTPCNFEAGI KSGAWSPYV MGVPSSKVLS DITTSHYFRI ERKNYGEREQ 60 0 

CQKLYGVNRC QPQYSILILV SPIGAPLTKP LPPKPLNLIY AQPKIMKNTP QPIILSPLKP 660 

PSTGLKAF see 
<212> Type : PRT 
<211> Length : 668 

SequenceName : SEQ ID 298 

SequenceDescription : 



Sequence 



<213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

MPVIRVLVML ATMMMKLVKT AKEKKVFKNV GISIMGIAFW EAIKDSIKKQ IKKSDWICGN 60 

VKTADDYLKT HPNSWFNSAI GVTAI TAMLM NVCFADDQSK KEVAQAQKEA ENARD RANKS 120 

GIELEQEEQK TEQEKQKTEQ EKQKTEQEKQ KTEQEKQKTE QEKQKTSNIE TNNQI KVEQE 180 

QQKTEQEKQK TNNTQKDLVN KAEQNCQENH NQFFXKKLGI KAGIAIEIEA ECKTPKPTKT 240 

NQTPIQPKHL PNSKQPHSQR GSKAQELIAY LQKELESLPY SQKAIAKQVD FYRPSS IAYL 300 

ELDPRDFNAT EEWQKENLKI RSKAQAKMLE MRSLKPDPQA HLSTSQSLLL VQKI FADVSK 3 60 

EIKWANTEK KVEKAGYGYS KRM 383 
<212> Type : PRT 
<211> Length : 383 

SequenceName : SEQ ID 299 
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S equenc eDe scription : 
Sequence 

5 <213> OrganismName : Helicobacter pylori J99 
<400> PreSequenceString : 

MNYPNLPNSA LEISEQPEVK EITNELLKQL QNALRSNAHF SEQVELSLKC IVRILEVLLS 60 
LDFFKNANEI DSSLRNSIEW LTNAGESLKL KMKEYERFFS EFNTSMHANE QEVTNTLNAN 120 
AENIKSEIKK LENQLIETTT RLLTSYQIFL NQARDNANNQ ITKNKTQSLE AITQAKNNAN 180 

10 NEISNNQTQA ITNITEAKTN ANNEISNNQT QAITNINEAK ESATTQINAN KQEAINNITQ 240 
EKTQATS E I T EAKKTDHYQN IDFFEFE 267 
<212> Type : PRT 
<211> Length : 267 

S equenc eName z SEQ TD 3 00 

15 SequenceDescription : 

Sequence 



<213> OrganismName r Helicobacter pylori J99 

20 <400> Pre Sequence St ring : 

MKFFSKDLFK KVTPLFLSVY FLSPTLTQAK SRFYVASQYQ VGKMIMKKYN DLKRTIEGAS 60 
FSLGWEINPT NYWFYSRYYF FMDYGNVILN KRTGAQANMF TYGFGGDLIM EYNKNPLYVF 120 
SLFYGMQVAE NTWTISKHSA NFIIDDWRSI QGFSLKTSNF RMLGLVGFKF QTVLFHHDAS 180 
IEVGIKWPFA FEYDSPFVRL FSVFISHTFY L 211 

25 <212> Type : PRT 

<211> Length : 211 

S equenc eName : SEQ ID 301 
SequenceDescription : 

30 Sequence 



<213> OrganismName ~ Helicobacter pylori J99 
<4 00> PreSequenceS tiring : 

MKKFTLSLFL CCTLLNAEED 1FRNNTNETD LTNS FEHGKE NNNLIPAKSD SLESFKEQEN 60 
35 KEKAKQLMDL KALQSVYFSK NRKLQDNNFN VLYVAGNTNK IRLRYAMTTT FIFDNDPIIY 120 
VSLGDPSDFE LTYPTNDHYD LSNMLVIKPL LIGVDTNLTV VGASGTIYTL LFV 173 

<212> Type : PRT 
<211> Length : 173 
40 SequenceName : SEQ ID 302 

SequenceDescription : 

Sequence 



45 <2 13 > OrganismName : Mycoplasma pneumoniae 
<400> Pre Sequences t ring : 

MLDYVPWIGN GYRYGNNHRG SNSSTSGVTT QGQSQNASSN EPAPTFSNVG VGL KANVNGT 60 

LSGSRTTPNQ QGTPWLTLDQ ANLQLWTGAG WRNDKNGQSD ENYTNFASAK GSTNQQGSTT 120 

GGSAGNPDSL KQDKADKSGD SVTVAEATSG DNLTNYTNLP PTSPPHPTDR TRCHSPTRTT 18 0 

50 PSGCSCSCAA CWAASRCWSI RVGKMI TVS L IPPTKNGLTP N 221 



<212> Type : PRT 

<211> Length : 221 

SequenceName : SEQ ID 3 03 
SequenceDescription : 

55 

Sequence 



<213> OrganismName : Mycoplasma pneumoniae 
<400> PreSequence String : 

60 MDDITAPQTS AGSSSGTSTN TSGSRSFLPT FSNVGVGLKA NVQGTLGGRQ TTTTGNNIPK 60 

WATLDQANLQ LWTGAGWRND KTTSGSTGNA NDTKFTSATG SGSGQGSSSG TNTSAGNPDG 12 0 

LQADKVD QNG QVKTSVQEAT SGDNLTNYTN LPPANLTPTA DWPNALSFTN KNNAQRAQLF 18 0 

LRGLLGSIPV LVNKSGQDDN SKFKAEDQKW SYTDLQSDQT KLNLPAYGEV NGLLNPALVE 24 0 

TYFGNTRASG SGSNTTSSPG IGFKIPEQSG TNTTSKAVLI TPGLAWTPQD VGNIWSGTS 3 00 

65 FSFQLGGWLV TFTDFIKPRA GYLGLQLTGL DVS EATQREL IWAKRPWAAF RGSWVNRLGR 3 60 

VESVWDFKGV WADQAQLAAQ AATSSTTTTA TGATLPEHPN ALAYQISYTD KDSYKASTQG , 420 

SGQTNSQNNS PYLHFIKPKK VESTTQLDQG LKNLLDPNQV RTKLRQSFGT DHSTQPQPQS 480 
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LKTTTPVFGR SSGNLSSVFS GGGAGGGSSG SGQSGVDLSP VERVSGH 527 
<212> Type : PRT 
<211> Length : 527 

SequenceName : SEQ ID 3 04 
5 SequenceDescription : 

Sequence 



<213> OrganismName : Mycoplasma pneumoniae 

10 <400> Pre Sequenc eSt ring : 

MLKLAVGIFI SPTLTRFSTG FNXjAGSVLDQ VLDYVPWIGM GHRYGNNHRG VDDITAPKTG . 60 

AGSSSGTSTN TSGSRSFLPT FS2SFVGVGLKA NVQGTLGGSQ TTTTGKDIPK WPTLDPANLQ 120 

LWTGAGWRND KASNKQSDEN HTTFKSATGS GQQGGSTTGG SAGNPDSLKQ DKISKSGQNL 180 
TTQDGAPQSN STTESASNYD HLX>PNLTPTS DWPFALSFTN KNNAQRAQBF LRGLLGSIPV , . .. 240 

15 LVNRSGSDDS NKFQATDQKW SYTDLKSDQT KLNLPAYGEV NGLLNPALVE TYFGTTRAGG 3 00 

SGSNTTSSPG IGFKIPEQNN DSKAVLITPG LAWTPQDVGN LWS GTS LS F QLGGWLVTFT 3 60 

DFVKPRAGYL GLQLTGLDAS DATQRALIWA KRPWAAFRGS WVNRLGRVES VWDLKGVWQD 42 0 

QAQAAAQAAT TAAATGDALP EHiPNALAYQI SSTDKDSYKA STQSSGQTNS QNTSPYLHLI 480 

KPKKVENTTQ LDQGLKTCWT PTRFAPSCAK ALVQTIPPKP NPNPSKQPHR CLGRIWTLA 540 

20 VCLWGVLEE QTAPIRWTSP PLNGWVGGLW GNYPVGVGGI WRILKVCKT LLFISIFISI 600 

FFLNCSLTLF IWTTASLATG LrWGHFTST TTTLKRQQFS YTRPDEVALR HTNAINPRLT 660 

PWTYRNTSFS SLPLTGENPG AWALVRDNTA KGITAGSGSQ QTTYDPTRTE AAIiTTATTFV 720 

LRRYDLAGRC TTSTFRS 737 
<212> Type : PRT 

25 <211> Length : 737 

SequenceName : SEQ ID 3 05 
SequenceDescription : 

Sequence 
30 

<213> OrganismName : Mycoplasma pneumoniae 
<400> PreSequenceString r 

MLDYIPWIGN GHRYGNDHRG SRSSTSGVTT QGQQSQNASG 

TLGGSQTTTT GKDIPKWPTL DQANLQLWTG AGWRNDKASS 
35 SSSGTTNSAG NPDS LKQDKV DKSGDSVTVA ETTSGDNLTN 

TNKNNAQRAQ LFLRALLGSI PVLVNKSGQD DSNKFQATDQ 

EVNGLLNPAL VEVYGLS S TQ GSSTGAGGAG GNTGGDTNTQ 

ATLITPGLAW TAQDVGNLW SGTSLSFQLG GWLVTFTDFI 

QRELIWAPPA LNRLSWQLGQ PI*GPRGECVG FQGGVGGSSS 
40 HQCFGLS GEL YRPGFVQGFH SKIiRPKPKHL PL PALGAGEK 

<212> Type : PRT 

<211> Length : 465 

SequenceName : SEQ ID 3 06 
SequenceDescription : 

45 

Sequence 



<213> OrganismName : Mycoplasma pneumoniae 
<40 0> PreSequenceString : 

50 MLGSIPVLVN RSGSDSNKFQ ATDQKWSYTD LQSDQTKLNL 
TTRTSSTANQ NSTTVPGIGF KIPEQNNDSK ATLITPGLAW 
GWLVTFTDFV KPRAGYLGLQ LSGLNASDSD QRELIWAPRP 
LKGVWADQAQ LAAQAATSST TTTATGATLP EHPNALAYQ I 
QNNSLYLHLI KPKKVESTTQ LDQGLKNLLD PNQVRTKLRQ 

55 VFGAMSGNLG SVLSGGGAGG AGSTNSVDLS PVERVSGSLT 
<212> Type : PRT 
<211> Length : 347 

SequenceName : SEQ ID 307 
SequenceDescription : 

60 

Sequence 



<213> OrganismName : Mycoplasma pneumoniae 
<400> PreSequenceString : 
65 MGQQGQSGTS AGNPDSLKQD KXSKSGDSLT TQDGNATGQQ 
ALSFTNKNNA HRAQLFLRGL LGSIPVLVNR SGSDSNKFQA 
AYGEVNGLLN PALVETYFGN TRAGGSGSNT TSSPGIGFKI 



TEPASTFSNV GVGLKANVQG 60 

GQSDENHTKF TSATGSGQQG 120 

YTNLPPNLTP TADWPNALSF 180 

KWSYTELKSD QTKLNLPAYG 240 

TYARPGIGFK LPSTDSESSK 3 00 

KPRSGYLGLQ LTGLDANDSD 3 60 

VRLASSYKYH HRNEGYLIGA 420 

SRFLW 465 



SAYGEVNGLL NPALVETYFG 60 

TPQDVGNLW SGTTVSFQLG 12 0 

WAAFRGSWVN RLGRVESVWD 180 

SYTDKDSYKA STQGSGQTNS 24 0 

SFGTDHSTQP QPQSLKTTTP 3 00 

INRNFSY 347 



EATNYTNLPP NLTPTADWPN 60 
TDQKWSYTDL QSDQTKLNLP 12 0 

PEQNNDSKAT LITPGLAWTP 180 
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QDVGNLWSG TSLSFQLGGW LVSFTDFIKP RAGYLGLQLS GLDASDSDQR ELIWAKRPWA 240 

AFRGSWVNRL GRVESVWDLK GVWADQAQLA AQAATSEASG SALAPHPNAL AFQVSWEAS 3 00 

AYSSSTSSSG SGSSSNTSPY LHLIKPKKVE STTQLDQGLK NLLDPNQVRT KLRQS FGTDH 3 60 

STQPQSLKTT TPVFGTSSGN IGSVLSGGGA GGGSSGSGQS GVDLSPVERV SGH 413 

<212> Type : PRT 
<211> Length : 413 

SequenceName : SEQ ID 308 

SequenceDescription : 

Sequence 



<213> OrganisraName : Mycoplasma pneumoniae 
^4Q0> PreSequenceString 
15 MGLQLSGLDA SDSDQRELIW AKRPWAAFRG SWVNRLGRVE SVWDLKGVWA DQAHSAVSES 60 
QAATSSTTTT ATGDTLPEHP NALAYQISST DKDSYKASTQ GSGQTNSQNT SPYLHLIKPK 120 
KVTASDKLDD DLKNLLDPNE VRVKLRQSFG TDHSTQPQPQ PLKTTTPVFG TNSGNLGSVL 180 
SGGGTTQDSS TTNQLSPVQR VSGWLVGQLP STSDGNTSST NNLAPNTNTG NEWGVGDLS 240 
KRASIESSRL WIALKP 256 
20 <212> Type : PRT 

<211> Length : 256 

SequenceName : SEQ ID 309 

SequenceDescription : 

25 Sequence 



<213> Organi smName : Mycoplasma pneumoniae 
<40 0> PreSequenceString : 

MRDNTAKGIT AGSGSQQTTY DPARTEATLT TTTFALRRYD LAGRALYDLD FSKLNPQTPT 
30 RDANCQITFN PFGGFGLSGS APQQWNEVKN KYPVEVAQDP TDPYRFAVLL VPRSWYYEQ 
LQRGLALPNQ GSSSGSGQQN TTIGAYGLKV KNAEADTAKS NEKLQGDESK SSNGSSSTST 
TTQRGSTNSD TKVKALKIEV KKKSDSEDNG QLQLEKNDLA NAP IKRGEES GQSVQLKADD 
FGTAPSSSGS GGNSNPGSPT PWRPWLATEQ IHKDLPKWSA SILILYDAPY ARNRTAIDRV 
DHUDPKVMTA NYPPSWRMPK WNHHGLWDWK ARDVLFQTTG FDESNTSNTK QGFQKEADSD 
35 KSAPIALPFE AYFANIGNLT WFGQALLVFG GNGHVTKSAH TAPLSIWLYI YLVKAVTFRL 
LLAJSrSLLSKS NIYKKTAN 
<212> Type : PRT 
<21X> Length : 438 

SequenceName r SEQ ID 310 
40 SequenceDescription r 



60 
120 
180 
240 
300 
360 
420 
" 438 



Sequence 



<213> Organi smName : Mycoplasma pneumoniae 

45 <40 0> PreSequenceString : 

MRDNIAKGIT AGSNTQQTTY DPTRTEATLT TATTFALRRY DLAGRALYDL DFSKLNPQTP 60 
TRDQTGQITF NPFGGFGLSG AAPQQWNEVK DKVPVEVAQD PSNPYRFAVL LVPRSWYYE 12 0 

QLQRGLALPN QGSSSGSGQQ NTTIGAYGLK VKNAEADTAK SNEKLQGYES KSSNGSSSTS 180 
TTQRGGSSNE NKVKALQVAV KKKSGSQGNS GDQGTEQVEL E SNDLANAP I KRGSNNNQQV 240 

50 QLICADDFGTA PSSSGSGTQD GTPTPWTPWL TTEQIHNDPA KFAASILILY DAPYARNRTA 3 00 

IDRVDHLDPK VMTANYPPSW RTPKWNHHGL WDWKARDVLL QTTGFFNPRR HPEWFDGGQT 360 
VADNE KTGFD VDNSENTKQG FQKEADSDKS APIALPFEAY FANIGNLTWF EQALLVFGIC 420 
LS 422 
<212> Type : PRT 

55 <211> Length : 422 

SequenceName : SEQ ID 311 
SequenceDescription : 



Sequence 

60 

<213> OrganismName : Mycoplasma pneumoniae 
<40 0> PreSequenceString : 

MLWPFRWVWW KRVLTSQTRA PAKPNPLTVP PTCTWWSLRK LPNPTKLDDD LKNLLDPNEV 60 
RARMLKS FGT ENFTQPQPQP QALKTTTPVF GTSSGNLGSV LSGGGYHAGL KHHQSTVTRS 120 
65 TGEWVDR 127 
<212> Type : PRT 
<211> Length : 127 
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SeguenceNarae : SEQ ID 312 
SequenceDescription : 

Sequence 

<213> OrganismName : Mycoplasma pneumoniae 
<4O0> PreSequenceString : 

MRDNSAKGIT AGSESQQTTY DPTRTEAALT ASTTFALRRY DLAGRALYDL DFSRLNPQTP 60 

TRDQTGQITF NPFGGFGLSG AAPQQWNEVK NKVPVEVAQD PSNPYRFAVL LVPRSWYYE 120 

QLQRGLALPN QGSSSGSGQQ NTTIGAYGLK VKNAEADTAK SNEKLQGDES KSSNGSSSTS 180 

TTTQRGGSSG DTKVKALQVA VKKKSGSQGN SGEQGTEQVE LESNDLANAP IKRGEESGQS 240 

VQIjKAADFGT TPSSSGSGGN SNPGSPTPWR PWLATEQIHK DLPKWSASIL ILYDAPYARN 3 00 

RTAIDRVDHL DPKVMTANYP PSWRTPKWNH HGLWDWKARD VLLQTTGFFN SRRHPEWFDQ 3 60 

GQ^TABNTQT GFDTDDTDNK KTRLSKGSWL RQAGPDEPPV WSVLRQHWQP KTVRASAFGV 4Z0 

WDLFVLIN 428 
<2X2> Type : PRT 
<211> Length : 428 

SequenceName : SEQ ID 313 

SequenceDescription : 

Sequence 

<2X3> OrganismName : Mycoplasma pneumoniae 
<40 0> PreSequenceString : 

MFGLKVKNAE ADTAKSNEKL QGAEATGSST TSGSGQSTQR GGS SGDTKVK ALQVAVKKKS 60 
GSQGNSGDQG TEQVELESND LANAPIKRGS NPASPTQGSR LRHHPIQFGI WSIRHPHPLK 120 
AVA.CDRAWS Q GPPQMIRLDP HSVRCALCL 149 
<212> Type : PRT 
<211> Length : 149 

SequenceName : SEQ ID 314 

SequenceDescription : 

Sequence 



<213> OrganismName : Mycoplasma pneumoniae 
<4O0> PreSequenceString : 

MFGXi KVKDAT VDSSKQSTES LKGEESSSSS TTSSTSTTQR GGS SGDTKVK ALQVAVKKKS 60 
DSEDNGQIEL ETNNLANAPI KRGSNNNQQV QLKADDFGTS PSSSESGQSG TPTPWTPWLA 12 0 

TEQIHKDLPK WSASILILYD APYARNRTAI DRVDHLDPKV MTANYPPSWR TPKWNHHGLW 180 
DWKARDVLVQ TTGFFNPRRH PDWFDQGQAV AENTQTGFDT DDTDNKKQGF RKQGEQSPAP 240 
IALiPFEAYFA NIGNLTWFGQ ALLVFGI CLS 270 
<212> Type : PRT 
<211> Length : 270 

SequenceName : SEQ ID 315 

SequenceDescription : 

Sequence 



<213> OrganismName : Mycoplasma pneumoniae 
<4 0 0> PreSequenceString : 

MGSQNQGSTT TTSAGNPDSL VTDKVDQKGQ VQTSGQNLSD TNYTNLSPNF TPTSDWPNAL 60 
SFTISTKNNAQR AQLFLHGLLG SIPVLVNKSG ENNEKFQATD QKWSYTELKS DQTKLNL PAY 120 
GEVTSTGLLNPA LVETYFGTTR TSSTANQNST TVPGIGFKIP EQNNDSKAVL ITPGLAWTPQ 18 0 

DVGNLWSGT SFSFQLGGWL VSFTDFVKPR AGYLGLQLTG LDASDATQRA LIWAPPALSG 24 0 

LSWQLGQPVG PRGECVGFEG GVGGSSSVRL ARI YHHRNRG YLTGAPECFG LSGECGGSEC 3 00 

LQAKHELRPN PIH 313 
<212> Type : PRT 
<211> Length : 313 

SequenceName : SEQ ID 316 

SequenceDescription : 

Secjuence 



<213 > OrganismName : Mycoplasma pneumoniae 
<40 0> PreSequenceString : 

MS F GLVGTVN NNGWKSPFRH ETKYRAGYDK FKYYKTHYRG AKKAGTNDDR WRWTAWFDLD 60 
FAHQKIVLIE RGELHRQADL KKSDPATNET SKTVWGSIKE KLLQNVNNLH SEKGVFLWFR 12 0 
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QSGFTTTRN 

<212> Type : PRT 

<211> Length : 129 

SequenceName : SEQ ID 317 
5 SequenceDe script ion : 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 

10 <4 00> PreSequenceString : 

MAE P LAVDP T GLSAAAAKLA GLVFPQPPAP IAVSGTDSW AAINETMPSI ESLVSDGLPG 60 

VKAALTRTAS NMNAAADVYA KTDQSLGTSL SQYAFGSSGE GLAGVASVGG QPSQATQLLS 120 

TPVSQVTTQL GETAAELAPR WATVPQLVQ LAPHAVQMSQ NASPIAQTIS QTAQQAAQSA 180 
QGGSG^MF^Q LASAEKPATE QAEPVHEVTN DPQGDQGDVQ PAEWAAARD ^GA^-^QQ • 240.. 

15 PGGGVPAQAM DTGAGARPAA SPLAAPVDPS TPAPSTTTTL 28 0 



<212> Type : PRT 

<211> Length : 28 0 

SequenceName : SEQ ID 318 
SequenceDescription : 

20 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<400> PreSequenceString : 

25 MRYLIATAVL VAWLVGWPA AGAPPSCAGL GGTVQAGQIC HVHASGPKYM LDMTFPVDYP 60 
DQQALTDYIT QNRDGFVNVA QGSPLRDQPY QMDATSEQHS SGQPPQATRS WLKFFQDLG 12 0 

GAHPSTWYKA FNYNLATSQP ITFDTLFVPG TTPLDSIYPI VQRELARQTG FGAAILPSTG 180 
LDPAHYQNFA I TDD SL I F YF AQGELLPSFV GACQAQVPRS AIPPLAI 227 
<212> Type : PRT 

30 <211> Length r 227 

SequenceName : SEQ ID 319 
SequenceDescription : 

Sequence 
35 

<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<400> PreSequenceString : 

MKMVKSIAAG LTAAAAIGAA AAGVTS IMAG GPWYQMQPV VFGAPLPLDP ASAPDVPTAA 60 
QLTSLLNSLA DPNVSFANKG SLVEGGIGGT EARIADHKLK KAAEHGDLPL SFSVTNIQPA 120 
40 AAGSATADVS VSGPKLSSPV TQNVTFVNQG GWMLSRASAM ELLQAAGN 168 
<212> Type : PRT 
<211> Length : 168 

SequenceName : SEQ ID 320 

SequenceDescription : 

45 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<400> PreSequenceString : 

50 MTYSPGNPGY PQAQPAGSYG GVTPSFAHAD EGAS KL PMYL NIAVAVLGLA AYFAS FGPMF 60 

TLSTELGGGD GAVSGDTGLP VGVALLAALL AGVALVPKAK SHVTWAVLG VLGVFLMVSA 12 0 

TFNKPSAYST GWALWWLAF IVFQAVAAVL ALLVETGAIT APAPRPKFDP YGQYGRYGQY 18 0 

GQ YGVQPGGY YGQQGAQQAA GLQSPGPQQS PQPPGYGSQY GGYSSSPSQS GSGYTAQPPA 240 

QPPAQSGSQQ SHQGPSTPPT GFPSFSPPPP VSAGTGSQAG SAPVNYSNPS GGEQSSSPGG 3 00 

55 APV 3 03 



<212> Type : 'PRT 
<211> Length : 3 03 

SequenceName : SEQ ID 321 

SequenceDescription : 

60 

Sequence 

<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<4 00> PreSequenceString : 
65 MKGPGVSDCV ATVRHDNVFA IAAGLRWSAA VP P LHKGDAV TKLLVGAIAG GMLACAAILG 60 
DGIASADTAL IVPGTAPSPY GPLRSLYHFN PAMQPQIGAN YYNPTATRHV VSYPGSFWPV 120 
TGLNSPTVGS SVSAGTNNLD AAIRSTDGPI FVAGLSQGTL VLDREQARLA NDPTAPPPGQ 180 
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LTFIKAGDPN NLLWRAFRPG THVPIIDYTV PAPAESQYDT INIVGQYDIF SDPPNRPGNL 240 
LADLNAIAAG GYYGHSATAF SDPARVAPRD ivrrTNSLGA TTTTYFIRTD QLPLVRALVD 3 00 

MAGLPPQAAG TVDAALRPII DRAYQPGPAP AVNPRDLVQG IRGIPAIAPA IAIPIGSTTG 3 60 

ASAATSTAAA TAAATNALRG ANVGP GANKA LSMVRGLLPK GKKH 404 
5 <212> Type : PRT 

<211> Length : 404 

SequenceMame : SEQ ID 322 

SequenceDescription : 

10 Sequence 



<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<400> PreSequenceString : 

M.qt.t.z-tt * t . p p p FDA T PN 2 1 EDLD vtLYft ^"^"T-h r ^- C*T*K AQLGEI ""T»wr.r.», «o 60 

15 KAPHCPAESD QTPAGAAGDG DLPEVGGRVT SPPQPPVAAL TGYSANIGGL SVPHSWlMLPP 12 0 

AVRQVAAMFP GATPMYMTGS SDGSYAGLAA AGLAGTGLAG LAARGGSAPT PAAAAPAGAG 180 
GAGPAATRPA AQQTPAVPAA AAGSAIPGLP PGLPPGWAN LAATLAAIPG ATIIWPPSP 24 0 

NANQ 244 
<212> Type : PRT 
20 <211> Length : 244 

SequenceName : SEQ ID 323 
SequenceDescription : 



Sequence 
25 

<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<400> PreSequenceString : 

MDVALGVAVT DRVARLALVD SAAPGTVIDQ FVLDVAEHPV EVLTETWGT DRSLAGENHR 60 
LVATRL CWPD QAKADELQHA LQDS GVHDVA VISEAQAATA LVGAAHAGSA VLLVGDETAT 120 

30 L S WGDPDAP PTMVAVAPVA GADATSTVDT LMARLGDQAL APGDVFLVGR SAEHTTVLAD 18 0 

QLRAAS TMRV QTPDDPTFAL ARGAAMAAGA ATMAHPALVA DATTSLPRAE AGQSGSEGEQ 240 
LAYS QAS DYE LLPVDEYEEH DEYGAAADRS APLSRRSLLI GMAWAFAVI GFASLAVAVA 3 00 

VTIRPTAAb'K PVEGHQNAQP GKFMPLL PTQ QQAPV^PPPP DDPTAGFQGG TIPAVQrTWP 3 60 

RPGTSPGVGG TPASPAPEAP AVPGWPAPV PIPVPITIPP FPGWQPGMPT IPTAPPTTPV 42 0 

35 TTSATTPPTT PPTTPVTTPP TTPPTTPVTT PPTTPPTTPV TTPPTTVAPT TVAPTTVAPT 480 
TVAPTTVAPA TATPTTVAPQ PTQQPTQQPT QQivlPXQOQTV APQTVAPA^Q PPSGGRNGSG 54 C 

GGDLFGGF 548 
<212> Type r PRT 
<211> Length z 548 

40 SequenceName : SEQ ID 324 

SequenceDescription : 

Sequence 



45 <213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<4 00> PreSequenceString : 

MKNARTTLIA AAIAGTLVTT SPAGIANADD AGLD PNAAAG PDAVGFD PNL PPAPDAAPVD 60 
TPPAPEDAGF DPNLPPPLAP DFLSPPAEEA PPVPVAYSVN WDAIAQCESG GNWS INTGNG 120 
YYGGLRFTAG TWRANGGSGS AAMASREEQI RVAENVLRSQ GIRAWPVCGR RG 172 

50 

<212> Type : PRT 
<211> Length : 172 

SequenceName : SEQ ID 325 

SequenceDescription : 

55 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<4 00> PreSequenceString : 
60 MTRLIPGCTL VGLMLTLL PA PTSAAGSNTA TTLFPVDEVT QLETHTFLDC HPNGSCDFVA 60 
GANLRTPDGP TGFP PGLWAR QTTEIRSTNR LAYLDAHATS QFERVMKAGG SDVITTVYFG 12 0 

EGPPDKYQTT GVIDSTNWST GQPMTDVNVI VCTHMQWYP GVNLTSPSTC AQANFS 176 

<212> Type : PRT 
65 <211> Length : 176 

SequenceName : SEQ ID 326 
SequenceDescription : 
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Sequence 

<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
5 <4 00> PreSequenceString : 

MTPGLLTTAG AGRPRDRCAR IVCTVFIETA WATMFVALL GLSTISSKAD DIDWDAIAQC 60 
ESGGNWAANT GNGLYGGLQI SQATWDSNGG VGSPAAASPQ QQIEVADNIM KTQGPGAWPK 12 0 

CSSCSQGDAP LGSLTHILTF LAAETGGCSG SRDD 154 
<212> Type : PRT 
10 <211> Length : 154 

SequenceName : SEQ ID 327 
SequenceDescription : 



15 



40 



65 



Sequen 



<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<4 00> PreSequenceString : 

MMQQAVSGIT GALGGAVGGV MGPLTQLPQQ AMQAGQGAMQ PLMSALQQTY GAEGLDVADG 60 

ARLVDSIEGE PGLGGEPGAG DVGAGGGGGG TTPTGYLGPP PVPTSSPPTT PAGAPAKSVT 120 
20 PDPVSGTPRA SGPAGMTGMP MVP PGALGAG AEGANKDKPV EKRVTGCAEW STGQGPLNST 18 0 

AECSGEICRR QAGGHQVDAT DPCCAERRQG 210 

<212> Type : PRT 

<211> Length : 210 

SequenceName : SEQ ID 328 
25 SequenceDescription : 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H37Rv 
30 <400> PreSequenceString : 

MIRELVTTAA ITGAAIGGAP VAGADPQRYD GDVPGMNYDA SLGAPCSSWE RFIFGRGPSG 60 
QAEACHFPPP NQFPPAETGY WVISYPLYGV QQVGAPCPKP QAAAQSPDGL PMLCLGARGW 120 
QPGWFTGAGF FPPEP 135 
<212> Type : PRT 
35 <211> Length : 13 5 

SequenceName : SEQ ID 329 
SequenceDescription : 



Sequence 



<213> OrganismName : Mycobacterium tuberculosis H3 7Rv 
<40 0> PreSequenceString : 

MKTTGTTIKL GIVWLVLSVF TVMIIWFGQ VRFHHTTGYS AVFTHVSGLR AGQ FVRAAGV 60 
EVGKVAKVTL I DGDKQVLVD FTVDRSLSLD QATTASIRYL NLIGDRYLEL GRGHS GQRLA 12 0 

45 PGATIPLEHT HPALDLDALL GGFRPLFQTL DPDKVNSIAS SIITVFQGQG ATINDILDQT 18 0 

ASLTATLADR DHAIGEWNN LNTVLATTVK HQTEFDRTVD KLEVLITGLK NRADPLAAAA 240 
AHISSAAGTL ADLLGRIVHC CTAASGTSRA SSSRS 275 
<212> Type : PRT 
<211> Length : 275 

50 SequenceName : SEQ ID 330 

SequenceDescription : 

Sequence 

55 <213> OrganismName : Mycobacterium tuberculosis H37Rv 
<40 0> PreSequenceString : 

MTPRSLVRIV GVWATTLAL VSAPAGGRAA HADPCSDIAV VFARGTHQAS GLGDVGEAFV 60 
DSLTSQVGGR SIGVYAVNYP ASDDYRASAS NGSDDASAHI QRTVAS CPNT RIVLGGYSQG 12 0 

ATVIDLSTSA MPPAVADHVA AVALFGEPSS GFSSMLWGGG SLPTIGPLYS SKTINLCAPD 180 
60 DPICTGGGNI MAHVSYVQSG MTSQAATFAA NRLDHAG 217 
<212> Type : PRT 
<211> Length : 217 

SequenceName : SEQ ID 331 

SequenceDescription : 



Sequence 
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<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<400> PreSequenceString : 

MISTTRIDFL WILSVAFASM IALATLLTLI NQWGTPYIP GGDSPAGTDC SELASWVSNA 60 
ATARPVFGDR FNTGNEEAAL AARGFQQGTA PNALVT GWNG HHTAVTLPDG TPVSSGEGGG 120 
VRVGGGGAYQ PKFTHHMYLP MDVDAGEDQP PAPDEPVTAV DDVEPEMPAP CPTQRPPVTP 180 
RHNLCNKLRT MP GALS AAL A AAAPVWPAPI SGCRGFSTSL LAKRNHPVIV GK 23 2 

<212> Type : PRT 
<211> Length : 232 

SequenceName : SEQ ID 3 32 

SequenceDescription : 

Sequence 



124 



<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<40 0> PreSequenceString : 

MTTMITLRRR FAVAVAGVAT AAATTVTLAP APANAADVYG AIAYSGNGSW GRSWDYPTRA 60 
AAEATAVKSC GYSDCKVLTS FTACGAVAAN DRAYQGGVGP TLAAAMKDAL TKLGGGYIDT 120 
WACN 

<212> Type : PRT 
<211> Length : 124 

SequenceName : SEQ ID 333 

SequenceDescription : 

Sequence 



<213> OrganismName : Mycobacterium tuberculosis H37Rv 
<400> PreSequenceString : 

MAGLNIYVRR WRTALHATVS ALIVAILGLA I TPVAS AATA RATLSVTSTW QTGFIARFTI 60 
TNSS TAPLTD WKLEFDL PAG ESVLHTWNST VARSGTHYVL SPANWNRIIA PGGSATGGLR 120 
GGLTGSYSPP SSCLLNGQYP CT 142 
<212> Type : PRT 
<211> Length r 142 

SequenceName : SEQ ID 334 

SequenceDescription : 

Sequence 



<213> OrganismName t Mycobacterium tuberculosis H37Rv 
<400> PreSequenceString : 

MLTRAIKTQL VLLTVLAVIA VWLGWYFLR IPSLVGIGRY TLYAELPRSG GLYRTANVTY 60 

RGITIGKVTG VEP TERGARA TMSIDNGYQI PTDASANVHS VSAVGEQFVD LVSTRTSGPY 12 0 

LRHGQTITTT TVPSQIGPAL DAANRGLAVL PKDRVASVLH EAS EAVGGLG SSLNRLIEAT 180 

QAIAHDVRGS LEDIDDIIER SAPIIDSQVN SGNEIARWAA NLNTLAAQTA QTDPAVRSIL 240 

ANAAPTADQV NATFSDVRES LPQTLANLEV VIDMLKRYHN GVEQALVFLP QSGAIAQSVT 30 0 

TEFPGQAGLG VGGLALNQPP PCLTGFLPAS EWRSPADTST APLPKGTYCR IPMDASNWR 360 

GARNNPCVDV PGKRAATPRE CRSNEAYVPG GTNPWYGDPN QMLSCPAPAA RCDQPVKPGQ 42 0 

VIPAPSVNNG INPLPADQLP GTPPPVNDPL QRPGSGTVQC NGQQPNPCVY TPSTFPTTIY 48 0 

DVQSGKWAP DGWYSVEAS THAGADGWKV MLAPTG 516 
<212> Type : PRT 
<211> Length : 516 

SequenceName : SEQ ID 33 5 

SequenceDescription : 

Sequence 



<213> OrganismName : Rickettsia prowazekii 
<400> PreSequenceString : 

MLNNTQFLNL MKSYMKPEFY MS S IKNTTNL DLSSITNTIQ KAMNIFFTTN KISTESMQSL 60 
FKKNSEIIQN NINTILNSTK EVINSKDFKQ ATEYHQKCVK S I YETS MDNA KELANIAYEA 12 0 

SNKIFEAANK HITKNIHNAS NN I HNTAEQ V QKNFNNKSA 159 
<212> Type : PRT 
<211> Length : 159 

SequenceName : SEQ ID 336 

SequenceDescription : 



Sequence 
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<213> Organi smName : Rickettsia prowazekii 
<400> Pre Sequences t ring : 

MNIKLVTYFL ILVSSLKVNA DLNHIQDSFK YQEAEQLTIE LPWNDCTAIH KFLEEKLFFS 60 
5 EQQIKKENKI HEKYKQFYLQ HNNKLSDFSM QFLEKKSEIN SVETLISGFL KFCEDNFQTS 120 
KSKSHSLNFF QKQQDQWLHN IRNENYKTYY KKKYEDNTFR NIN 163 
<212> Type : PRT 
<211> Length. : 163 

SequenceNarae : SEQ ID 337 
10 SequenceDe script ion : 

Sequence 



<213> Organi sraName :• Rickettsia prowazekii 
15 <400> PreSequenceString : 

MKKLLL I ATA SATIIiSSSVS FAECIDNEWY LRADAGVAMF NKEQDKATGV KLKSNKAIPI 60 
DLGIGYYISE NVRADLTLGT TIGGKLKKYG AATNTHFTGT NVSVSHKPTV TRLLINGYVD 120 
LTSFDMFDVF VGGGVGPALV KEKISGVSGL ASNTKNKTNV SYKLIFGTSA QIADGVKVEL 180 
AYS WIND GKT KTHNVMYKGA SVQTGGMRYQ SHNLTVGVRF GI 222 
20 <212> Type : PRT 

<211> Length : 222 

SequenceName : SEQ ID 33 8 

SequenceDescription : 

25 Sequence 



<213> OrganismName : Rickettsia prowazekii 
<400> PreSequenceString : 

MKKNMRKQML KIISIIIISL LLSSCSESTR DENGLLTDSQ STIIRDYIIS QNSKNLKVNL 60 

30 KEKFGSNLKG VKLIGIKLTN EDLSGIDFTS CEILRTDFMG SNLEKAILTN SVIQESNFAD 12 0 

SVIKNISGYN ADFQGS I FNN ITLQNTNFVQ SNFSDTAFNK STIINVNFEN SPCFSNVLWCH 180 

SNIDSSNFQK THLKNNSFKN TNVMNSIFYG ADLGKSVINN TNFTNNYFES SDLSNTKFTS 240 

VIIKDSNFTQ SIFNSVNFNN IQSNNSFFSY TSFEDSTLHN IHLTKCDLQN STINSSVFNN 30 0 

FKIDNAILTN MSLNDNTFNN LSIKNSNTNF VRINKSKGFN ITLLNTNYSN AIFSNNDLKE 360 

35 FKVINTDLNN SEIINSNFTN GQFNNVNFSQ SLIQNVNFTD VKITLGNLNQ VALINSNLIN 42 0 

TNIINSVLSN SQINNINYQA YYSFINTNVS NNIVINDNSN QIPPNNIVIN SEKDLQNISN 480 

LANMNLTNFN LSNLVFNGVD FSKSIFKKAN LTNTVIKNSI LKDANF SAAI LTKTDFSKSI 540 
LTGS I FKFAQ IDQTCFSNSD LTNTDFTEAT IKNTAFDNAN THGIKGLE ' ' 58 8 
<212> Type : PRT 
40 <211> Length : 588 

SequenceName : SEQ ID 339 
SequenceDescription : 

Sequence 
45 

<213> Organi sniName : Porphyromonas gingivalis W83 
<400> PreSequenceString : 

MIQKFTNVKL NDMRKILSFL MMCSLHLGLQ SQTWHGDPDS VAALPSIGIQ ESSCTRITFE 60 

WFPGFYSVE KREGNQVFQR ISMPGCGSFG NLGEAELPVL KKMIAVPEFS TANVAVKIKE 12 0 

50 TETFDNYNIY PNPTYWEEL PEGGTYLVEA FAINNDYYSQ NVSLPSTHYV YSQDGYFRSQ 18 0 

RFIEVTLYPF RYNPVRQEIL FAKKIEVTIT FDNPQPPLQK NTGIFNKVAS SAFINYEADG 24 0 

KSAIENDMVF SRGTTTYISG NVASNLPQNC DYLVIYDDMF NVNQQPHDEI KRLCEHRAFY 3 00 

NGFDVAAVS I KDVLNSFPSN ATSYINETKL KNFIRSVYNQ SNAKRTLDGK LGYVLLIGKP 3 60 

LSKYLADTDN TKVPTSFIHN VSLIPSHPTF GSICASDYFF SCVSPLDTVG DLFIGRFSVT 420 

55 NAHELHNLIE KTINKEISYN PIAHKNILYA EGKGCDAPIL RLFLKEIASG YTVNSILKSN 4 80 

QVSAIDSIFD CLNNGSHHFY FNTHGMPTVW GIGQGLDVNT LTARLNNTSS QGLCTSLSCS 54 0 

SAVADSTIRS LGEVLTTYAP NKGF SAFLGG SRATQYAVYL EGPCPPSEFY EYLPYSLYHN 60 0 

LSTWGEMLL SSIINTNSVD TYSKFNFNLL GDPALNIMAH GMEVSNCITL PNNTIISSPI 660 

TIKNGGCLKI PEKGVLHFTN NGSIQVMSGG TLEIGNQAKI SGETGANPTF ITVYGDGLAI 72 0 

60 NKQVEIDNID RLNLFSTHSV MPKFHFDSVK FNSAPLYTTN CIVEISNCEF TNRSDIISKN 780 

CDLSVENSMF SSSGITVFKP MATSSITGLS TKAKITDNTF FATGNFAYHI TNTPGLTATS 840 

NAAIKLDNIP EYYISGNKIV NCDEALVLNN SGNRTNRLHN ITRNVIKNCR IGSTLYNSYG 900 

IYNRNKISNN H I GVRLLNNS CFYFDNAPVI NEEDKQTFIS NRTWQLYSSN GTFPLNFHYN 960 

SLQGGDTDTW IYNDTYTNRY IDVSNNHWGN NDLFDPNQVF NTPDLFIWIP FWDGLPNGRS 1020 

65 GNSSAEAVEF QTALDCIGNS DYLSAKVALK MMVETYPESD FAIAALKELF RIEKMSGNDY 1080 

EGLKDYFRSN PTIISSQNLF PTADFLSARC DIVCENYQSA IDWYENRLNS EISYQDSVFA 1140 

VIDLGDIYWN MQLDSLRGTG IDLNILSCEQ RKSLESHQNV KNYLLSTLPE STGTLLPPLE 1200 
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CNKSSLDKSK IISISPNPAK AWTIIYYTD NPSCSVIKIY GINGASADIT GLPKHLSEGY 12 60 

YSIQFNTSNF DPGFYLVTLN VDQKIIDTEK LRIK 1294 
<212> Type : PRT 
<211> Length : 1294 
5 SequenceName : SEQ ID 340 

SequenceDe script ion : 

Sequence 



10 <213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MQGKNTIVTT GDYSIGLLSQ TSGNLNTDTI IRVNSDGSVT PSFSDGDDTF IVTAGNHAVG 60 

VLACASPGSA CACVSSLDEE STADTGSNEN NAIAKLDMAK GEITTHGTES YAAYANGTW 120 

KAGDTLDYTN ASVTLTDVDI TTHGDNAHAI AARQGTVSFN QGEIYTTGPD AAIAKIYNGG 180 

15 TVTLKNT S AV AHQGSGIVLE SS1NGQEATV DILSGSSLRS ANEILYHKDE TSNVTITDSE 24 0 

VSSAADVFIN NIKGHLTVDA TNSKITGSAN I S TDDNTHT Y LSLSDNSTWD IKADSTVSNL 3 00 

TVDNSTVYIS RADGRDVEPT RLTITENYVG NNGVLHLRTE LDDDNSATDK WINGNTSGT 3 60 

TRVKVTNAGG SGAYTLNGIE IISVEGESNG EFIKDSRIFA GAYEYSLTRG NTEATNKNWY 42 0 

LTNFQATSGG ETNS GGS SAP TVAPTPVLRP EAGS YVANLA AANTLFVMRL NDRAGETRYI 480 

20 DPVTEQERSS RLWLRQIGGH NAWRDSNGQL RTTSHRYVSQ LGGDLLTGGF TDSDSWRLGV 540 

MAGYARDYNL THSSVSDYRS KGSVRGYSAG L YATWFADD I SKKGAYIDSW AQYSWFKNSV 60 0 

KGDELAYESY SAKGATVSLE AGYGFALNKS FGLEAAKYTW IFQPQAQAIW MGVDHNAHTE 660 

ANGSRIENDA NNNIQTRLGF RTFIRTQEKN SGPHGDDFEP FVEMNW IH3STS KDFAVSMNGV 72 0 

KVEQDGVSNIi GEIKLGVNGN LNPAASVWGN VGVQLGDNGY NDTAVMVGLK YKF 773 



25 

<212> Type : PRT 

<211> Length : 773 

SequenceNarae : SEQ ID 341 
SequenceDescription : 

30 

Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

35 MTKLKLLALG VLIATSAGVA HAEGKFS LGA GVGWEHPYK DYDTDVYPVP VINYEGDNFW 60 
FRGLGGGYYL WNDATDKLSI TAYWS PLYFK AKDS GDHQMR HLDDRKS TMM AGLSYAHFTQ 12 0 

YGYLRTTLAG DTLDNSNGIV WDMAWLYRYT NGGLTVTPGI GVQWNSENQN EYYYGVSRKE 180 
SARSGLRGYN PNDSWSPYLE LSASYNFLGD WSVYGTARYT RLSDEVTDSP MVDKSWTGLI 240 
STGITYKF 248 

40 <212> Type : PRT 

<211> Length z 248 

SequenceMame : SEQ ID 342 
SequenceDescription : 

45 Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MKKIALAGLA GMLLVSASVN AMSISGQAGK EYTNIGVGFG TESTGLALSG NWTHNDDDGD 60 

50 VAGVGLGLNL PLGPLMATVG GKGVYTNPNY GDEGYAAAVG GGLQWKIGNS FRLFGEYYYS 12 0 

PDSLSSGIQS YEEANAGARY TIMRPVSIEA GYRYLNLSGK DGNRDNAVAD GLYVGVNASF 180 

<212> Type : PRT 
<211> Length : 18 0 
55 SequenceName : SEQ ID 343 

SequenceDescription : 

Sequence 



60 <213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MTTLTARVFT TAEIIYRKTV IALVCHLNCS RQETVTMNKT IMALAIMMAS FAANASVLPE 60 

TPVPFKSGTG AIDNDTVYIG LGSAGTAWYK LDTQAKDKKW TALAAFPGGP REQATSAFID 120 

GNLYVFGGIG KNSEGLTQVF NDVHKYNPKT NSWVKLMSHA PMGMAGHVTF VHNGKAYVTG 18 0 

65 GVNQNIFNGY FEDLNEAGKD STAIDKINAH YFDKKAEDYF FNKFLLSFDP STQQWSYAGE 240 

SPWYGTAGAA WNKGDKTWL INGEAKPGLR TDAVFELDFT GNNLKWNKLD PVSSPDGVAG 3 00 

GFAGISNDSL I FAGGAGFKG SRENYQNGKN YAHEGLKKSY STDIHLWHNG KWDKSGELSQ 3 60 
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GRAYGVS L P W NNSLLIIGGE TAGGKAVTDS VLISVKDNKV TVQN 404 
<212> Type : PRT 
<211> Length : 404 

SequenceName : SEQ ID 344 

SequenceDe script ion : 

Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
<4 00> PreSequenceString : 

MATGGAALAG KAVMGAAAGA AGGASALQAA FQKASASMET GGDMSSMGSV VSSGGNGGGE 60 
AGTAGSSPFA QAAGFGDSGS S S SGGGFAKA AKLATGTASE LAKGVGSQVK QGFQERVSET 120 
TGGKLAASIR ESMEPKEASQ SGQFEGNSLG ADSGPDSNEV RS 162 
<212> Type : PPT 
<211> Length : 162 

' SequenceName : SEQ ID 345 
SequenceDescription : 

Sequence 



<213> Organi smName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MKRVLIPGVI LCGADVAQAV DDKNMYMYFF EEMTVYAPVP VPVNGNTHYT SESIERLPTG 60 

MGNISDLLRT NPAVRMDSTQ STSLNQGDIR PEKISIHGAS PYQNAYL IDG ISATNNLNPA 12 0 

NESDASSATN ISGMSQGYYL DVSLLDNVTL YDSFVPVEFG RFNGGV IDAK IKRFNADDSK 180 

VKLGYRTTRL DWLTSHIDEN NKSAFNQGSS GSTYFSPDFK KNFYTLSFNQ ELADNFGVTA 240 

GLSRRQSDIT RAD YVSNDG I VAGRAQYKNV IDTALSKFTW FASDRFTHDL TLKYTGSSRD 300 

YNTSTFPQSD REMGNKSYGL AWDMDTQLAW AKLRTTVGWD HIS D YTRHDH DIWYTELSCT 3 60 

YGDITGRCTR GGLGHISQAV DNYTFKTRLD WQKFAVGDVS HQPYFGAEYI YSDAWTERHN 420 

QSESYVINAA GKKTMHT I YH KGKGSLGIDN YTLYMADHIS WRNVSLMPGV RYDYDNYLSN 480 

HNISPRFMTE WDI FADQTSM ITAGYNRYYG GNILDMGLRD IRNSWTESVS GNKTLTRYQN 54 0 

LKTPYNDELA MGLQQKIDKN VIARASEAHD QISKSSRTDS ATKTTITEYN NDGKTKTHSF 600 

NLSFELAEPL HIRQVDINPQ IVFSYIKSKG NLSLNNGYEE SNTGDNQWY NGNLVS YD S V 660 

PVADFNNPLK ISLNMDFTHQ PSGLVWANTL AWQ EARKARI ILGKTNAQYI SEYSDYKQYV 72 0 

DEKLDSSLTW DTRLSWTPQF LKQQNLTISA DILNVLDSKT AVDTTNTGVA TYASGRTFWL 780 

DVSMKF 786 
<212> Type : PRT 
<211> Length : 786 

SequenceName : SEQ ID 346 

SequenceDescription : 



Sequence 



<213> Organi smName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MKKTLLAIML AGTAFAS QAG TLVSQGTEAS ANLTLTKP IV VNNTIQPVKG VYS GTLTAWT 60 
PLATGIVGAS DGQSHDYAVT FPDDIYAESS TSADAVISGD NNPDHKLKVS LTTLEQDPPS 12 0 

AASEEIGGKR YMMLKNTGTG GAYRWSHMK EQWEPDSYT IRTQAYIYAE 170 
<212> Type : PRT 
<211> Length : 170 

SequenceName : SEQ ID 347 

SequenceDescription : 

Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
<40 0> PreSequenceString : 

MGIYHWSRKT KMKRTKS IRH ASFRKNWSAR HLTPVALAVA TVFMLAGCEK SDETVSLYQN 60 

ADDCSAANPG KSAECTTAYN NALKEAERTA PKYATREDCV AEFGEGQCQQ APAQAGMAPE 12 0 

NQAQAQQSSG SFWMPLMAGY MMGRLMGGGA GFAQQPLFSS KNPASPAYGK YTDATGKNYG 180 

AAQPGRTMTV PKTAMAPKPA TTTTVTRGGF GESVAKQSTM QRSATGTSSR SMGG 234 



<212> Type : PRT 

<211> Length : 234 

SequenceName : SEQ ID 348 
SequenceDescription : 
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Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
<4 00> PreSequenceString : 
5 MTKMSRYALI TALAMFLAGC VGQREPAPVE EVKPAPEQPA EPQQPVPTVP SVPTIPQQPG 60 
PI EHEDRTAP PAPHIRHYDW NGAMQPMVS K MLGADGVTAG SVLLVDSVNN RTNGSLNAAE 120 
ATETLRNALA NNGKFTLVSA QQLSMAKQQL GLSPQDSLGT RSKAIGIARN VGAHYVLYSC 18 0 

ASGNVNAPTL QMQLMLVQTG EIIWSGKGAV SQQ 213 
<212> Type : PRT 
10 <211> Length : 213 

SequenceNarae : SEQ ID 349 

SequenceDescription : 

Sequence 
15 _ 

<213> OrganismName : Shigella flexneri 2a str. 2457T 
<400> PreSequenceString : 

MTKLMQFVQR CYYMTNKKMY FILILVFTLL QVCFFALWKA RDGSTTSLEC TSTLTRNAKT 60 
DHSLYYSANL SVILKKDGSG SFTIVGLTDE DTPRKFSHSY FFTYKIDSNG RISGNAKAKV 12 0 

20 SGLENQIKDE NFRLNFLDAS L TGKGNARL S KFNNVYIFSI PGLIINTCAP I 171 

<212> Type : PRT 
<211> Length : 171 

SequenceName : SEQ ID 350 
25 SequenceDescription : 

Sequence 



<213> OrganismName : Shigella flexneri 2a str. 2457T 
30 <400> PreSequenceString : 

MGRISSGGMM FKAITTVAAL VIATSAMAQD DLTISSLAKG ETTKAAFNQM VQGHKLPAWV 60 
MKGGTYTPAQ TVTLGDETYQ VMSACKPHDC GSQRIAVMWS EKSNQMTGLF SAIDEKTSQE 12 0 

KLTWLNVNDA LSIDGKTVLF AALTGSLENH PDGFNFK 157 
<212> Type : PRT 
35 <211> Length : 157 

SequenceName SEQ ID 351 
SequenceDescription : 

Sequence 
40 

<213> OrganismName : Streptococcus mutans UA159 

<400> PreSequenceString : 

MKKQFLEKAV FTVAATAATV VLGNKMADAD TYTLQEGDSF FSVAQRYHMD AYELASMNGK 60 

DITSLILPGQ TLTVNGSAAP DNQAAAPTDT TQATTETNDA NANTYPVGQC TWGVKAVATW 120 
45 AGDWWGNGGD WAS S AS AQGY TVGNTPAVGS I MCWTDGGYG HVAYVTAVGE DGKVQVLESN 180 

YKDQQWVDNY RGWFDPNNSG TPGSVSYIYP N 211 

<212> Type : PRT 

<211> Length : 211 

SequenceName : SEQ ID 3 52 
50 SequenceDescription : 

Sequence 



<213> OrganismName : Streptococcus mutans UA159 

55 <400> PreSequenceString : 

MS IKNILENK TTTIKVSFAG IATAASLILP MAVQAETTYT VKSGDTLSEI ASTHGTTVDK 60 

LAKLNKINNI HLIHAGQILE LDAATEDTDA TPVQESQINE AETSASAKTS QTSEVTTTAP 120 

VQESQTSEVI TSAPAETSQT SEVPTEANQT NEVSSAVSVE TSQTSEATTS APVETSQTSE 180 

ATTAEPTETK TSQTNEVAAS AEENQTTSNT SGLSTSDAAA KEFIAQKESG GNYNAKNGQY 24 0 

60 YGRYQLSDSY LNGDLSEENQ ERVADAYVSS RYGSWTAAQA FWNANGWY 2 88 



<212> Type : PRT 

<211> Length : 288 

SequenceName : SEQ ID 3 53 
SequenceDescription : 

65 

Sequence 
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<213> OrganismName : Streptococcus mutans UA159 
<40 0> PreSequenceString : 

MKCQAFEDFK ATSLNKLSYT TGGATDGEII ANRMLQGKAT KGEITMYTWN IIQNGWVNSL 60 
VSWGIGGYNS SIGYSAQGNR GFSNYPYDVS MDSDNSSSSS NTTGGYVNYN QSFNSGW 117 

5 

<212> Type : PRT 
<211> Length : 117 

SequenceName : SEQ ID 354 

SequenceDescription : 

10 

Sequence 



<213> OrganismName : Streptococcus mutans UA159 
<400> PreSequenceString : 

15 MRYSQICRKS LALLATGMIL TTSTLPSISI LAEDSTGAPA RPDGQAPAGG GANTTTYDYS 60 

GINSGVLVAN GSKVTSSSKT KSTTSAQNTA LVQNGGSLTL HKANLIKSGD DNNGDNDNFY 12 0 

GINSILLAVN ERSKAYVSNS KLKASSSGSN GIFATDKATI YANKTSIATT ADNSRGLDAT 180 

YNGNIIANKM AISTKGAHSA AIATDRGGGN ISTTNSSLNT SGSGSPLLYS TGNIQWHVT 240 

GTSSNSQIAG MEGLNTILIH NSNLISTMTN KTASDPIANG VIIYQSQSGD AEATTGQSAH 3 00 

20 FELSKSKLTS SITSGSMFYL TNTSANIILN QSTLNFDANK AKLLTVAGNS ANNWGTPGSN 360 

GATVNFTGHK QTLKGDVDVD SISTLNMYLL DKTNYTGKTA VSTNSTNISP STSPITMNIS 42 0 

KNSKWVLTGH STVTMLNAEK GAKIVDKDGK TVSVISSSGQ KLVKGKSKYS LTVTGTYSQK 480 

VTTSSSNKPS SSYINRSDFD NYFKTTTAFV NNTKNTSN 518 
<212> Type : PRT 

25 <211> Length : 518 

SequenceName : SEQ ID 355 
SequenceDescription : 

Sequence 

30 

<213> OrganismName : Streptococcus mutans UA159 
<40 0> PreSequenceString : 

MNKIGDTLRD ARIEKKLSFD DWDKTGIAP HYILAMELDQ LKLLPEGKTN EYLEKYAHAV 60 
GLDPVSIIHG YRNQEMSDEL ILPSSAELAA SSDSNIEKKN EGKSIEEPQE LAIDSLDVTQ 120 

35 NITEETPQIE DFKVESEEAS KKIEKIPSRL SKYDYDEEPK KKFPWAL ILL ILLALTIISY 180 
VGYWYNQLQ TDSNKTELST STKKSKDTKN DANSTTQSQT SITTDFADGG NNITLSNTNG 240 
KVEVTFTLTG DEESWSATN TTDGESGTTL TATDKTYTVT LAEGSTTSML TVGSPSGVEI 3 00 

TINGQKVDTT NLVNAGLTNI NLTVQ 3 25 

<212> Type : PRT 

40 <211> Length : 325 

SequenceName r SEQ ID 356 
SequenceDescription : 

Sequence 

45 

<213> OrganismName : Streptococcus mutans UA159 
<40 0> PreSequenceString : 

MKSRKRQRKG LVRKNEIIIL TLFVASAVSL LAFTNSFGVL AKS LHLEKIN KSITISLPFG 60 
KKKMEQTARY YSGEQVQISS SAKKDSLGKG LSHYQNWIGT VKKIKSQKDS RQKHHYSYEV 120 

50 TFDNGKALKY VQE KDLVKTK RSKYSKGQIV KLKSSATADL DGSSLTDYRA SAGKIDHISY 180 
NHSNTTGGYK YDITFDEGGK VTNIQEKDLD KVYEVQLKSE NTAAQNNEIL KQAFAYAKQH 240 
SGTILSLPNG EFKIGSQTPD KDYITLTSDT EIRGDNTTLL VEGSAYWFAF ATGTSASDGV 3 00 

KNFTMRNINI KASDLEKGNQ FMIMADHGDN WKICNNSFTM VHKKGSHIFD LGSLQNSAFE 360 
GNQFTGYAPE LTNVSKIDDN ADLHDFYSEV IQLDAAESSG VWDGGLIKAI DPNYENYNKE 420 

55 KQLCNNITIA NNSFVPYIDS HGKI IAYSGT IGQHSSDVGL VKIYDNVFSN SLVSRFNQNG 480 
KSEAWIFKAI HLKSNYNNAV YANSIS 506 
<212> Type : PRT 
<211> Length : 506 

SequenceName : SEQ ID 3 57 

60 SequenceDescription : 

Sequence 



<213> OrganismName : Streptococcus mutans UA159 
65 <400> PreSequenceString : 

MRKL KVALFA SSILGMLAVS SYTAADTEDN QVTISHYNEQ AGTFDVNAVQ AANGKTIQS I 60 
DVAIWSEENG QDDLKWYHAS NDGSNQLTVH FNAENHGSKV GSYIAHAYIT YTDGNRVGVN 120 
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LGKRKLS LSA PQLSLKQGGL QLFSKLKPSA ADQLFSAVWS DENGQDDLHW YTADADGNTL 180 
AGYANHK1GYG TYHVHTYLKQ NGKMIPISAQ DIDIPKPKVK IQIDKINDTS YDWVNNVPP 240 
YISSVAIPVW SEQNGQDDLK WYQATKVADG IFKTTVYLKT HRFELGNYQA HIYGDSQLSK 3 00 

KLDGLGETHF NVPSIINYED PQVTIDHYNI NKGTFDVTVA ETDNSKAIQS ISAAVWSDAN 3 60 

5 QANLYWYEAK QLANGKAAIT VDVQKHGNQT GSYNVHVYVH YNDGTTSGHV LANQQLNQIV 420 
HYQPSAVRIT AYMNEKNTYP VGQCTWGVKE LAPWIPNWLG NGGQWAS TVA VKGFKIGTVP 480 
KVGAIACWSD GGYGHVAYVT HVESNNRIQV KEANYKNQQY I SNFRGWFDP TTSYLGRLTY 540 
IYPD 

<212> Type : PRT 
10 <211> Length : 544 

SeguenceName : SEQ ID 358 
SeguenceDescription : 



15 



25 



Sequence 



Sequence 



544 



<213> OrganismName : Streptococcus mutans UA159 
<400> PreSequenceString : 

MANNYSRRQQ PTKKTKGTSR KRPTEHIKTG FSALQKSVAI IAGILGIITA LITINNYRNS 60 
SHNDKKD S TS KTTIIKEKEV DDSNSNNNAA NSQAENDSNN NNNSAESNQN QTATTANDSN 12 0 

20 SNSANQWQAN SQSQANNQQN QNNANAGQ 148 
<212> Type : PRT 
<211> Length : 148 

SeguenceName : SEQ ID 359 

SeguenceDescription : 



<213> OrganismName : Streptococcus mutans UA159 

<400> P r/eSequenceS t ring : 
30 MKIFSFGTIR NNTALKPNYD DTTAFSGFGT IRNNTALKQS TNCASWFNRF GTIRNNTALK 60 

LTTLING-VSF CFGTIRNNTA LKPRGPIFVS TFRNRAIHLS QISASK 106 

<212> Type : PRT 

<211> Length : 106 

SequenceKame : SEQ ID 360 
35 SeguenceDescription : 

Sequence 



<213> OrganisraNarae t Streptococcus mutans UA159 

40 <400> PreSequenceString : 

MKRKRNIjYFL IGLFLTVFLL XGCSMQKKTK SESSSTSQKT TLQTKQSSEK STDAKQTTEA 60 
HSESSQSSSH SNNEETIiAP I DTGAVLKADY SSMAGTWKNE EGQTLTFDQR GLTTPGMTVS 120 
LLNIDQDGNL LLNVETGTKK NLTLYIVPAN KTLSNQYFSN GQSDESDKTK DRIVSSESLN 180 
SGKFTNR.VYY HVSTH 195 

45 <212> Type : PRT 

<211> Length : 195 

SequenceName : SEQ ID 361 
SequenceDe script ion : 

50 Sequence 



<213> OrganismName : Streptococcus mutans UA159 
<4 00> P resequence String : 

MTPKKIKIAL TALISLMLAL FLFLFNHHSV RENSQQEKLK ISKASSKKSQ TSTSSVMTSS 60 
55 RKATEQTSQA QTQSQSQAEQ SNPNVILPIP QELVGTYKGS SPQASEITFT ISSNGQLRAQ 120 

ANFDPASDIN DVTATVSGVR KVGADTYIWE FVSGSSAALL PGVTGIGGLG KMQPGFILKG 180 

GQLTPIMFTG SVDGEIDYSH PNPYPVSLNK Q 211 

<212> Type : PRT 

<211> Length : 211 
60 SequenceName : SEQ ID 362 

SequenceDescription : 

Sequence 



65 <213> OrganismName : Streptococcus mutans UA159 
<400> Pre Sequences t ring : 

MKKIINVIVL SLSVFFLIAC SNSSTGEKTS QSSEETKVRL IVKTDSNKTD EKVAFKKGAT 60 



WO 2005/076010 



108/341 



PCT/IN2005/000037 



VMDVLKDNYK VKESGGFITT IDGVTQDKKA GRYWMFDVND KLASKAADKI KVKNGDKI E F 12 0 



35 



60 



YLKVYKGKN 

<212> Type : PRT 

<211> Length : 129 

SequenceName : SEQ ID 3 63 
SequenceDescription : 

Sequence 



Sequence 



Sequence 



129 



10 <213> OrganismName : Streptococcus mutans UA159 
<400> PreSequenceString : 

MSNKPWEEKV TDATTDNEEM TRNSKDASII STPILTILLS LFFLIIIGIL FFVLYTSNGG 60 
SNEKAAT S GF YSSSKTVKKA KNEANSQTDE QTTEAETSSS ETTSSSSDSD GET I T VQGGE 120 
GAAAIAARAG ISVDKLYELN PEHMTHGYWY ANPGDNIKIK 160 
15 <212> Type : PRT 

<211> Length : 160 

SequenceName : SEQ ID 3 64 

SequenceDescription : 

20 Sequence 



<213> OrganismName : Streptococcus mutans UA159 
<4 00> PreSequenceString : 

MPDNRMNYSI DSNMQFPLVE ITLETGEFAY IQRGSMVYHT PSVTLNTKVN GRGS GLGKLV 60 

25 GAIGRSVTSG ESFFITQAVS NASDGKLALA PSMPGQVIAL ELGEKQYRLN DGAFLALDGS 120 

AQYQMKAQSV GRALFGGQGG LFVMTTEGQG TLLANSFGSI KKIELQNQEI TIDNAHWAW 180 

SRDLNYDIHL ENGFMQSIGT GEGWNTFRG TGEIYVQSLN LQQFAGVLQG FITNTNR 237 

<212> Type : PRT 
30 <211> Length : 23 7 

SequenceName : SEQ ID 365 
SequenceDescription : 



<213> OrganismName r Streptococcus mutans UA159 
<400> PreSequenceString : 

MKKNYFWYGL LGLLALYLIT IAFIPGFHIF FSNMLMLALF FMLIALSNRS IFFFFLALGF 60 

LSIYLKDIFH FDYSTGPLFT GIIIIGVILN SFLKPHYSYS YKGNHYFNMK QHANYIDNET 120 
40 DVFLKTLFSE NTSYVTSQEL NKIIIDTKFG EQSVDLSQAQ FMTD SPEIH I DVSFGETNLR 18 0 

IPNNWKIINK THSPFASISF SGFPSTNGDF INVTLTGTVA MGSLNIQY 22 8 

<212> Type : PRT 

<211> Length : 228 

SequenceName : SEQ ID 366 
45 SequenceDescription : 

Sequence 



<213> OrganismName : Streptococcus pneumoniae R6 
50 <400> PreSequenceString : 

MKSITKKIKA TLAGVAAL FA VFAPSFVSAQ ESSTYTVKEG DTLSEIAETH NTTVEKLAEN 60 
NHIDNIHLIY VDQELVIDGP VAPVATPAPA TYAAPAAQDE TVSAPVAETP WSETWSTV 120 
SGSEAEAKEW IAQKESGGSY TATNGRYIGR YGSWTAAKNF WLNNGWY 167 
<212> Type : PRT 
55 <211> Length : 167 

SequenceName : SEQ ID 3 67 
SequenceDescription : 



<213> OrganismName : Streptococcus pneumoniae R6 
<400> PreSequenceString : 

MKHSHKKSFD WYSMQQRYSI RKYYFGAASV LLGTALVLGA AASVQTVQAE ENKQETTNSI 60 

SVGRGEAATK PAEVSASNKE KTYAAPTVAN PVETTPVKTE EVTKPAEKVE EAKDKKEEVT 120 

65 HQDAVDKSKL LTALSRAKKL ESKLYTEASA ANLQTSIQAG QSLLGKADAT EAELSAAESS 18 0 

IQSFIIGLEL RSNSNKETVS E TP VAKKAD A VESKEGAKPA ATTERSAVDS AILPTSTADK 240 

VETTSAPASI NEILKLGLSL SDARQNPAIR KEDVNRGYSG FRAASNPANP IVSGSGNTVA 300 
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FADISQGGRS YSFRGYGNSR GGNS IHYDVT TVRSGNSVNF TISYSAPGDS REFVNNNFIL 3 60 

DKGDGB'GNPS NAT ITS SNPR VREQSKSISQ GANYVSHSGY SMTSAISTNT EQTIRFSLPI 420 

INLNGDLSVR LKPVTFNVDQ GGGGAATSND PYSNSNYYYR ANPLYLDANP YGGTNNKTVS 480 

EDIDFQTVYIj PTSKLPEGQT RLVREGEKGQ RQITYKVHRF GNETLLGLPI SNSVTKEAKP 540 

5 RIMQIGVAKD LIDTVKPRVD QNKVGDTNNL TFYLDNDGNG VYTEGVDELV QKIAIKDGAK 600 

GEKGDQGERG LTGAKGEKGD RGERGLTGAQ GAKGEKGDRG ERGLTGAQGA KGE KGDRGER 660 

GLTGAQGAKG- EKGDRGERGL TGAQGAKGEK GAQGERGLTG AQGAKGEKGD QGERGLTGAQ 720 

GAKGEKGDQG ERGLTGAQGA KGEKGAQGER GLTGTQ GAKG EKGDRGERGL TGAQGAKGEK 780 

GDRGERGLTG AQGAKGE KGA QGERGLTGAQ GAKGEKGDQG ERGLTGAQGE KGDRGERGLT 840 

10 GAKGEKGDQG ERGITGAKGE KGAQGE RGLT GAQGAKGEKG DQGERGLTGA QGEKGAQGQA 900 

GRDGVTPTVT VKDNKNDGTH TITINDGRGN VTSTWRDGF DGAS PLVATQ RNDADKTTTV 9 60 

I F Y YDKNGlSnST ELDASDKKLK EWIADGAKG EKGDKGEQGL QGRDGEQGPK GEDGKTPTVK 1020 

VTDGQDGTHT* ITINDGKGGI TTTWRDGFD GASPLVSTHR NEADKTTTVJ FYYDLMDNNQ 1080 

FDEGDTKLKB WIADGKQGP KGDKGDHGKD GFTPEVTVTD NNNGTHTITI TQPDNRPS-LT 1140 

15 TIVKNGEDGBC TPKVKAERDD AKKQTTLTFY IDKDGDGSYT AGKDELVQTT WKDGQDGAA 12 00 

GASGRDGKEV LNGKVDPTTE GKDGDTFVNT QTGDVFVKKG NTWEPAGNIK GPKGDKGADG 12 60 

AKGE KGAQGE RGLTGAQGVK GEKGDQGERG LTGSKGEKGD QGERGLTGAQ GAKGD KGEQG 13 20 

LQGRDGAQGP KGADGQRGPA GPQGPKGEQG NPGTPGKDGK S L I AVKNGVL VTITPVEGRP 13 80 

QTTFVEDGQEC GADGKTP TVT I TEGQNGTHT LTVHNPGSPD VTTTIRDGAT GQAGRDGKDV 1440 

20 LNGKVNPQPN QGKNGDKYIN I ETGDVYVKN NGNWDKEGNI KGP KGDKGAD GAKGEKGDQG 1500 

ERGLTGAQGA. KGAD GAVGRD GRDGKDVLNG KANPEAHQGK DGDKYVNTET GDVFVKNNGN 1560 

WDKEGNIKGP KGDKGAD GAK GEKGDRGERG LTGAQGAKGA DGAAGRDGRD GRDGKDVLNG 1620 

KVNPEANQGK DGDKYVNTET GDVFVKNNGN WDKEGNIKGS KGDKGERGED GKTPEVTVTP 1680 

GKDGHSTDIT FTVPGKDPVT VNVKD GENGL NGKTPKVDLL RVQGKNGNPS HTIVTFYTDE 1740 

25 NNDGKYTPGT DELLGSEMIK DGAKGADGRD GKS LLTVKDG KETKVYQEDP ANPGQPLNPE 18 0 0 

KP LAVI RDGV D GKS P TV TAV RKDEAGHKGV EITVDNHDGS QPTTVFVQDG AKGKTGATGQ 1860 

DGQTPTITTQ RGQDGQSTW TITTSGKDPV TFTVKDGKNG KDGRAPKIKV EDITSPSRIR 192 0 

RDTDAAATPT RNGIRVTVYD DVNDNGVYDE GVDKVLNSKD IYNGIDGRDG SAPTI TTKDN 1980 

GDGTHT I TVQ NPDGSESTTV VKDGKDGKTA NITTTENPDG SHTITVTNPD GSTKETWKN 2040 

30 GKDGKTPKVE VTDNNDGTHT VKVTDGDGNV TNAIIKDGKD GKAATATTTE NPDGSHTVTI 2100 

TNPDGTKNEF WKNGRDGVD GRTPTASVRD NGDGSHTIVI TNPEGVTTET TVRDGKS PKV 2160 

TITDEQNGTH KISVLNGDGT TTETXIKDGK SPVATVRDNQ DGTYTIRVEN GNGTVS ETTV 2220 

RD GKS PTAKV VDNGDGTHT I TWNSDGITT TTTVRDGREP KLEVIDNNDG SHT I KVTGAD 2280 

GKGTTTTI FD GKSPKANIVD NGDGTHTLTI VDSDGREYKS IIKDGKDGKD SVSPTVTVKN 2340 

35 NNDGTHWTX TNPDGSKTEM VIKDGKDGKS P KVS VEDNGD GSHTITIINS DGTVTKTVIK 2400 

DGKDGRDGRO GRDGKDGKDG KCGCQDKPVT PSNDKPVPPT PNVPTPEVPV KPVPAQPTPN 2460 

VPTPEVPVQE* TPAVSTPEVP VKPVPAVPEQ PWPTPAQPA TPVNANPVAP TTGKENRGDK 2520 

LPETGSQSDY ISVLLGSGIL LSLYVGRRKE D 2551 
<212> Type : PRT 

40 <211> Length. : 2551 

SequenceName : SEQ ID 368 
SeqxienceDescription : 

Sequence 
45 

<213> OrganisitiName : Streptococcus pneumoniae R6 
<4 00> Pre Sequenc eSt ring : 

MKKRMLLASX VALSFAPVLA TQAEEVLWTA RSVEQIQNDL TKTDNKTSYT VQYGDTLSTI 60 

AEALGVDVTV LANLNKITNM DLIFPETVLT TTVNEAEEVT EVEIQTPQAD SSEEVTTATA 12 0 

50 DLTTNQVTVD DQTVQVADLS Q P I AEAPKE V ASSSEVTKTV IASEEVAPST GTSVPEEQTA 180 

ETSSAVAEEA. PQETTPAEKQ ETQTSPQAAS AVE ATTTS S E AKEVAS SNGA TAAVSTYQPE 240 

ETKIISTTYE APAAPDYAGL AVAKS ENAGL QPQTAAFKEE IANLFGITSF SGYRPGDSGD 3 00 

HGKGLAI DFM VPERSELGDK IAEYAIQNMA SRGISYIIWK QRFYAPFDSK YGPANTWNPM 3 60 

PDRGSVTENH YDHVHVSMNG 380 

55 <212> Type : PRT 

<211> Length : 380 

SeqxienceName : SEQ ID 369 
Sequ-enceDescription : 

60 Sequence 



<213> OrganisniName : Streptococcus pneumoniae R6 
<400> PreSequenceString : 

MTILGKDTVQ QSAKGESVTQ EATPEYKLEN TPGGDKGGNT GSSDANANEG GGSQAGGSAH 60 

65 TGSQNSAQSQ A SKQ LATERE SAKNAIEKAA KNKQDE I KGA PLSDKEKAEL LARVEAEKQA 120 

ALKEIENAKT MEDVKEAET I GVQAIAMVTV PKRPVAPNAA PKTTSAPQAT AGTMQDVTYQ 180 

SPAGKQLPNT GSASSAALAS L GL WATS GF ALLGRKTRRR K 221 
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<212> Type : PRT 

<21l> Length : 221 

SequenceName : SEQ ID 370 
SecnaenceDescription : 

5 

Sequence 



<213> OrganismName : Streptococcus pneumoniae R6 

<40 0> Pre Sequences t ring : 
10 MMTTGCSMG-A. YHALNFFLQH PDVFTKVIAL SGVYDARFFV GDYYNDDAIY QNSPVDYIWN 60 

QNDGWFIDRY RQAEIVLCTG LGAWEQDGLP SFYKLKEAFD KKQIPAWFAE WGHDVAHDWE 12 0 

WWRKQMPYFL GNLYL 13 5 

<212> Type : PRT 

<211> Length : 135 
15 SeqxienceName : SEQ ID 371 

SequenceDescription : 

Sequence 



20 <213> OrganismName : Streptococcus pneumoniae R6 
<400> Pre Sequences t ring : 

MNKGLFEKRC KYSIRKFSLG VASVMIGATF FGTSPVLADS VQSGSTANLP ADLATALATA 60 

KENDGHDFEA PKVGEDQGSP EVTDGPKTEE ELLALEKEKP AEEKPKEDKP AAAKPETPKT 120 

VTPEWQTVEK KEQQGTVTIR EEKGVRYNQL SSTAQNDNAG KPALFEKKGL TVDANGNATV 180 

25 DLTFKDDSEK GKSRFGVFLK FKDTKNNVFV GYDKDGWFWE YKSPTTSTWY RGSRVAAPET 24 0 

GSTNRLSITL KSDGQLNASN NDVNLFDTVT LPAAVNDHLK NEKKILLKAG SYDDERTWS 3 00 

VKTDNQEGVK TEDTPAEKET GPEVDDSKVT YDTIQSKVLK AVIDQAFPRV KEYSLNGHTL 360 

PGQVQQFNQV FINNHRITPE VTYKKINETT AEYLMKLRDD AHLINAEMTV RLQWDNQLH 42 0 

FDVTKIVNKN QVTPGQKIDD ERKLLSSISF LGNALVSVSS DQTGAKFDGA TMSNNTHVSG 480 

30 DDHIDVTNPM KDLAKGYMYG FVSTDKLAAG VWSNSQNSYG GGSNDWTRLT AYKETVGNAN .540 

YVGIHSSEWQ WEKAYKGIVF PEYTKELPSA KWITEDANA DKKVDWQDGA IAYRSIMNNP 60 0 

QGWKKVKDIT AYR I AMNFGS QAQNPFLMTL DGIKKINLHT DGLGQGVLLK GYGSEGHDSG 660 

HLNYADIGKR IGGVEDFKTL IEKAKKYGAH LGIHVNASET YPESKYFNEK ILRKNPDGSY 72 0 

SYGWNWLDQG INIDAAYDLA HGRLARWEDL KKKLGDGLDF IYVDVWGNGQ SGDNGAWATH 780 

35 VLAKEINKQG WRFAIEWGHG GEYDSTFHHW AADLTYGGYT NKGINSAITR FIRNHQKDAW 840 

VGDYRSYGGA ANYPLLGGYS MKDFEGWQGR SDYNGYVTNL FAHDVMTKYF QHFTVSKWEN ' 9 00 

GTPVTMTDNG STYKWTPEMR VELVDADNNK VWTRKSNDV NSPQYRERTV TLNGRVIQDG 960 

SAYLTPWNWD ANGKKLS TDK EKMYYFNTQA GATTWTLPSD WAKS KVYL YK LTDQGKTEEQ 102 0 

ELTVKDGKIT LDLLANQPYV LYRSKQTNPE MSWSEGMHIY DQGFNSGTLK HWTISGDASK 10 8 0 

40 AEIVKSQGAU DMLRIQGNKE KVSLTQKLTG LKPNTKYAVY VGVDNRSNAK AS ITVNTGEK 1140 

EVTTYTNKSXi ALNYVKAYAH NTRRNNATVD DTSYFQNMYA FFTTGSDVSN VTLTLSREAG 1200 

DEATYFDEIR TFENNSSMYG DKHDTGKGTF KQDFENVAQG IFPFWGGVE GVEDNRTHLS 12 60 

EKHDPYTQRG WNGKKVDDVI EGNWSLKTNG LVSRRNLVYQ TIPQNFRFEA GKTYRVTFEY 1320 

EAGSDNTYAF WGKGEFQSG RRGTQASNLE MHELPNTWTD SKKAKKATFL VTGAETGDTW 13 80 

45 VGIYSTGNAS NTRGDSGGNA NFRGYNDFMM DNLQIEEITL TGKMLTENAL KNYLPTVAMT 1440 

NYTKESMDAXi KEAVFNLSQA DDDI SVEEAR AEIAKIEALK NALVQKKTAL VADDFASLTA 15 0 0 

PAQAQEGLAN AFDGNLSSLW HTSWGGGDVG KPATMVLKEA TEITGLRYVP RGSGSNGNLR 1560 

DVKLWTDES GKEHTFTATD WPDNNKPKDI DFGKTIKAKK IVLTGTKTYG DGGDKYQSAA 162 0 

ELIFTRPQVA ETPLDLSGYE AALAKAQKLT DKDNQEEVAS VQAS MKYATD NHLLTERMVE 1680 

50 YFADYLNQLK DSATKPDAPT VEKPEFKLSS VASDQGKTPD YKQEIARPET PEQILPATGE 174 0 

SQFDTALFLA SVSLALSALF WKTKKD 1767 
<212> Type : PRT 
<211> Length : 1767 

SeqiaenceName : SEQ ID 372 

55 SeqiaenceDe script ion : 

Sequence 



<213> OrganismName : Streptococcus pneumoniae R6 

60 <400> Pre SequenceSt ring : 

MKLYNKSELR YSRIFFDKRP PAFAFILIIS TAIILSGALV GAAYI PKNYI VKANGNSVIT 60 

GTEFLSAISS GKWTLHKSE GDMVNAGDVI ISLSSGQEGL QASSLNKQLV KLRAKEAIFQ 120 

KFEQSLNEKY NRMSNSGEEQ EYYGKVEYYL SQLNSENYNN GTQYSKIQDE YTKLNKITAE 180 

RNQLDADLQT LQNELIQLQQ QGDSPSLSDT TSADDKAKLE TKILEITTKI EALKTNITSK 240 

65 NSEIDSQQSN IKDMNRTYND PTSQAYNIYA QLVSELGTAR SNNNKSITEL EANLGVATGQ 300 

DKAHS ILAPlsT EGTLHYLVPL KQGMSIQQGQ TIAEVSGKEK GYYVEAFVLA SDISRVSKGA 3 60 

KVDVAITGVlSr SQKYGTLKGQ VRQIDSGTIS QETKEGNISL YKVMIELETL TLKHGSETW 420 
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LQKDMPVEVR IVYDKETYLD WILEMLSFKQ 
<212> Type : PRT 
<211> Length : 450 

SequenceName : SEQ ID 373 
5 SequenceDescription : 

Sequence 



450 



<213> OrganismName : Neisseria meningitidis Z2491 

10 <4 00> PreSequenceString : 

MNKGLHRIIF SKKHSTMVAV AETANSQGKG KQAGSSVSVS LKTSGDLCGK LKTTLKTLVC 60 

SLVSLSMVLP AHAQITTDKS APKNQQWIL KTNTGAPLVN IQTPNGRGLS HNRYTQFDVD 12 0 

NKGAVLNNDR NNNPFLVKGS AQLILNEVRG TASKLNGIVT VGGQKADVI I ANPNGITVNG 18 0 

GGFKNVGRGI " LTIGAPQIGK DGALTGFDVR QGTLTVGAAG WNDKGGADYT GVLARAVALQ 240 

15 GKLQGKNLAV STGPQKVDYA SGEISAGTAA GTKPTIALDT AALGGMYADS ITLIANEKGV 3 00 

GVKNAGTLKA AKQLIVTSSG RIENSGRIAT TADGTEASPT YLSIETTEKG AAGTFISNGG 3 60 

RIESKGLLVI ETGEDISLRN GAWQNNGSR PATTVLNAGH NLVIESKTNV NNAKGSANLS 420 

AGGRTTINDA TIQAGSSVYS STKGDTELGE NTRIIAENVT VLSNGSIGSA AVI EAKDTAH 480 

IESGKPLSL.E TSTVASNIRL NNGNIKGGKQ LALLADDNIT AKTTNLNTPG NLYVHTGKDL 540 

20 NLNVDKDLSA AS IHLKSDNA AHITGTSKTL TASKDMGVEA GLLNVTNTNL RTNSGNLHIQ 600 

AAKGNIQLRN TKLNAAKALE TTALQGNIVS DGLHAVSADG HVSLLANGNA DFTGHNTLTA 660 

KADVNAGSVG KGRLKADNTN ITSSSGDITL VAGNGIQLGD GKQRNSINGK HISIKNNGGN 720 

ADLKNLNVHA KSGALNIHSD RALS IENTKL ESTHNTHLNA QHERVTLNQV DAYAHRHLSI 78 0 

TGSQIWQNDK L P S ANKL VAN GVLALNARYS QIADNTTLRA GAINLTAGTA LVKRGNINWS 840 

25 TVSTKTLEDN AELKPLAGRL NIEAGSGTLT IEPANRISAH TDLSIKTGGK LLLSAKGGNA 900 

GAPSAQVSSL EAKGNIRLVT GETDLRGSKI TAGKNLWAT TKGKLNIEAV NNSFSNYFPT 960 

QKAAELNQKS KELEQQIAQL KKSSPKSKLI PTLQEERDRL AFYIQAINKE VKGKKPKGKE 1020 

YLQAKLSAQN IDLISAQGIE ISGSDITASK KLNLHAAGVL PKAADSEAAA ILIDGITDQY 10 80 

EIGKPTYKSH YDKAALNKPS RLTGRTGVS I HAAAALDDAR IIIGASEIKA PSGSIDIKAH 1140 

30 SDIVLEAGQN DAYTFLKTKG KSGKI IRKTK FTSTRDHLIM PAPVELTANG ITLQAGGNIE 12 00 

ANTTRFNAPA GKVTLVAGEE LQLLAEEGIH KHELDVQKSR RFIGIKVGKS NYSKNELNET 12 60 

KLPVRWAQT AATRSGWDTV L EGTEFKTTL AGAD IQAGVG EKARVDAKII LKGIVNRIQS 132 0 

EEKLETNSTV WQKQAGRGST IETLKLPSFE SPTPPKLSAP GGYIVDIPKG NLKTEIEKLS 13 80 

KQPEYAYLKQ LQVAKNINWN QVQLAYDRWD YKQEGLTEAG AAIIALAVTV VTSGAGTGAV 1440 

35 LGLNGAAAAA TDAAFAS LAS QASVSFINNK GDVGKTLKEL GRSSTVKNLV VAAATAGVAD 1500 

KIGASALNTJV SDKQWINNLT VNLANAGSAA LINTAINGGS LKDNLGDAAL GAIVSTVHGE 1560 

VASKIKFNLS EDYITHKIAH AIAGCAAAAA NKGKCQD GAI GAAVGE I VGE ALTNGKNPAT 1620 

LTAKEREQ XL AYSKLVAGTV SGWGGDVNT AANAAKVAIE NNLLSQEEYA LREKLIKKAK 1680 

GKGLLSLDWG SLTEQEARQF IYLIEKDRYS NQLLDRYQKN PSSLNNQEKN ILAYFINQTS 1740 

40 GGNTAWAASI LKTPQSMGNL TIPSKDINNT LSKAYQTLSR YDSFDYKSAV AAQPALYLLN 1800 

GPLGFSVKAA TVAAGGYNIG QGAKAISNGE YLHGTVQWN GTLMVAGSVS AQAAISAKPA 1860 

PVTRYLSNDS APALRQALTA ESQRIRMKLP EEYRQIGNLA IAKIDVKGLP QRMEAFSSFQ 1920 

KGEHGFI S LP ETKIFKPISV DKYHNIASPP RGTLRNIDGE YKLLET I AQQ LGNNRNVSGR 1980 

IDLFTELKAC QSCSNVILEF RNRYPNIQLN IFTGK 2015 

45 <212> Type : PRT 

<211> Length : 2015 

SequenceName : SEQ ID 374 
SequenceDescription : 

50 Sequence 



<213> OrganismName : Neisseria meningitidis Z2491 
<4 00> PreSequenceString : 

MDLIQTPNKQ FVDGDRRTPG TPVPAWWLNQ LQGELYSILN AVG I EPNKAD HAQVLSAIKT 60 

55 LAADASQVAS IDALRKYSGT GYVNVNAYHA NTTVGGGVFV ADKADKS TAD NGCTVIVSTD 120 

GTRWKRVFSG MLNLHDFGYV ASKNNALSTL NAAESAALDV WDCLGLSID TGNIYPQKNK 18 0 

YTNGKF VI NG KTVDVQYQPI RSGIGRFISG TGAAANLKSN EWTGAGLIVI GEGAMEQMEK 240 

CVSSIAIGDR AQGFSKVSRD NIAIGADSLI NVQAATEWYD QSRMEGTRNI GIGGNAGRGI 3 00 

TSGYSNVSIG RNAGQGLGEG SSNIALGAGA MAGTAPVGFS GDIEVFWPSS TSRTIAIGEA 3 60 

60 VLQTYQGRAA QTAI GANAAR NTKKAEKVTA I GS AAMENLE RNRAPNGGDV VWTGTEAGTY 42 0 

AQSGKNITDT FPNIRGAQAT YWVGIRLTSG TAQTLQNDW PAQWSVNGN TLIIQSSKEL 480 

TATGAAELKY VYSVNSTATK NEELTIIGAN AMNKALTAGY STI IGVDAAL LGDNYQKTTA 540 

IGASSLRTGS HISTTAIGYW VIPLASSEKC VAIGDSAGYR NVQGDFLTGK ITNSIAIGYG 60 0 

ARINGDNEIQ I GTTGQTLYA PTAVNIRSDG RDKADVKPLT NGLDFVMKLK PMTGYYDRRD 660 

65 SYVDELFKDL PADERADKVR EWWANP I KDG SHKEDRLRHW FIAQDIAALE DEYGRLPMVN 72 0 

KTNDTYTVEY ETFIPVLTKA IQEMAARIET LETEMKESKK 760 
<212> Type : PRT 
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<211> Length : 760 

SequenceName : SEQ ID 375 
SequenceDe script ion : 

5 Sequence 

<213> OrganismName : Streptococcus pyogenes MGAS823 2 
<4 00> PreSequenceString = 

MKNISRKCFM TSWCIILGG ILLGAGYATG GLQDIKHQTA PKKVIKTFDQ ITALDIDSSA 60 

10 STITVETGPV QRPTVTYYTH PKFIDPIVTT LTGKTLSLSQ KPKDIVITGG IEILGFTLNN 12 0 

SRQEKNYRSI TITVPEKTSL NEVKGSNVPH TTLSNLTVQD MQFDGNLTLL HTKVKKATIT 18 0 

GMLEATKSQL TNLELKADYS FSNLTDS S VE NGT I SLGMGQ LTTKDTTLKA INIQSLHPGG 24 0 

IEAERTTLEN VTFTVSKSKE EEEENDYYDN DAIFTAHALT LKGTNTISGG DIDVDITLTK 3 00 

AKAIAYRART ENGKVSLGSQ LT !PAKI GKE S TSDVISYVAE NKAATGNLTV NLNKGDI T I K . 360 



15 



20 



35 



55 



<212> Type : PRT 

<211> Length : 360 

SequenceName : SEQ ID 3 76 
SequenceDescription : 

Sequence 



<213> OrganismName : Streptococcus pyogenes MGAS8232 
<400> PreSequenceString = 

25 MFKKENLKQR YFNFGLVALA LTILAIIFAF SSKNADTKSY AKKSESKMVT IDKAPKNNHA 60 
ITKEESKEKA KSIASEPIPT VENSVAPTVT EEAPWQQEV TQTVQQVSSV AYNPNNWLS 12 0 

NGNTAGIVGS QAAAQMAAAT GVPQSTWEHI IARESNGNPN AANASGASGL FQTMPGWGST 18 0 

ATVEDQVNAA LKAYSAQGLS AWGY 2 04 

<212> Type : PRT 

30 <211> Length : 204 

SequenceName t SEQ ID 377 
SequenceDescription : 



Sequence 



<213> OrganismName : Streptococcus pyogenes MGAS8232 
<400> PreSequenceString z 

MLEELKTLIK NPKLMITMIG VALVPALYNL SFLGSMWDPY GRVNDLP I AV VNHDKPAKRA 60 

DKSLTIGNDM VDKMSKSKDL DYHFVSSKSA QKGLKKGDYY MVITLPEDLS QRATTLLNPE 12 0 

40 PQKLTIRYQT SKGHGMVAAK MGEETAMAKLK ESVSQNITKT YTSAVFSSMT DLQSGLKEAS 180 

TGSQALDSGA KTAQMGSQML SDNLAGLSSA SWQFQQGTNR LTSGLTAYTA GVSQVKDGLG 240 

QLSTDMPVYL NGVSRLSQGA SQLNQGLSQL TQSTTLSDDK AKRIQSLEVG LPVLNQGIQQ 3 00 

LNENLSTMQV PKLNTDELGN NKAAIAQAAQ QLLVKEAAAH KEQLAVLQAT SAYQSLTAEQ 3 60 

QGELTAALTQ TDKGEAVAPA QTILRSVQTL STSLQSLSQE DQSKQLEQLK EAVAQIANQS 420 

45 NQALPGASSA LTELSTGLAK VUGS LNQQVL PGSNQLTTGL AQLNRYNTAI GSGVIKLSEG 480 

ANAL S S KSGE LLDGSHQLSE GATKLADGSS QLSQGGHQLT SGLTELSTGL SILNGSLAKA 540 

SQQLSLVSVT DKNAKAVAKP LVLNEKDKDG VKTNGIGMAP YMIAVSLMW AL S TNVI FAN 60 0 

SLSGRPVKDK WDWAKQKFVI NGFISTMGSI VLYLAIQLLG FEARYGMETL GFIMLSGWTF 660 

MALVTALVGW DDRYGSFASL VMLLLQVGSS GGSYPIELSG AFFQKLHPFL PMTYWSGLR 720 

50 QTISLSGHIG VEVKVLTGFL LAFMVLSLLI YRPKKTV 757 
<212> Type : PRT 
<211> Length : 757 

SequenceName : SEQ ID 3 78 
SequenceDescription : 



Sequence 



<213> OrganismName : Streptococcus pyogenes MGAS8232 
<400> PreSequenceString z 

60 MSRDPTYTIN EHDLSFADGR FYVTFKADKS SETVRLNSSC LGNTIIKKLQ VEDDNTMHDF 60 

VKPKVTTQQA FGLAQQVKEL DLQLKDPKSD LWGKIKFNNK AMLVEYANKE MSSAIAQSAE 120 

QILLQVKSID DERYSKFEQT LHGI KQTVKS ESVESARTQL ASMFDSRISG LDGKYSRLSQ 180 

TIDSLSSRLD DGVGNYSTLS QKVSGIDLRV SNAANDVSRL SQTAQGLQSQ ITNANQNYSS 240 

LSQTVQGLQT TVRDNQSNAT SRINQLSDLI STKVSKGDVE TTIAQSYDKI AFAIRDKLPA 3 00 

65 SKMSGSEIIS AINLDRSGVK I TGKNI TLDG NSYISNAVIK DAHIANMDAG KINTGYLNAN 3 60 

RIATEAITGE KIKMDYAFFN KL,TANEGYFR TLFAKD I FAT SVQSVTLSAS KITGGVLAAT 420 

NGASQWDLNN ANMTFNRDAT IKTFKTSKNNAL VRKDGTHTAF VHFSNATPKG YRGSALYASI 480 
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GITSSGDGID SASSGRFAGL RSFRYATGYN HTAAVDQTEL YGDNVL I ADD F S I NRGFKFR 540 
PDKMEKVLDM NDLYAAWAL GRCWGHLANV GWNTAHSNFT SAVSRELNNY ITKI 594 

<212> Type : PRT 
5 <211> Length : 594 

Sequencetfatrie : SEQ ID 379 
SequenceDe script ion : 

Sequence 
10 

<213> Organi smName : Streptococcus pyogenes MGAS8232 
<4 00> PreSequenceString : 

MAADGKVTIL VDVDGKQVKV LNSELDKVAK HGDKGSSSLK KFAVGAGVFK LASAAVDLVS 60 

QSLGKAITRF, DTLEKYPRVM KAMGHSAEDV ARS TDK! G IDGLPTTLDE WGT AQRLT S 12 0 

15 ITKDINKSTN LTLALNNAFL ASGASSEAAS RGLEQYAQML SAGKVDMQAW KTLQETMPYA 18 0 

LQQTAEAFGF AGASAQKDFY EALKNGQITF DQFSNKLIEL NDGVGGFAEL AKENSKGIET 240 

S FHNT KNAI A KGVANS I KAL DDLSKAATGK GIADHFDSLK WINASFSAI NASIKASTPL 3 00 

FKLLFSVIGA GISWKALSP ALVGVAS GLA AMRAVNETIT M I KAXiNRAWV MAS AS M S I GA 3 60 

TTIKTVTAVQ AVS TTMTKAD MVARLSQLGV LKASTVIYGV MTGAISLSTA ATI AS TAAVT 42 0 

20 ALKAALVALT GPVGWVGAI GALVAVGVSL WSWLTKESDE TKKLKKEQEG LVESNKQLRD 48 0 

SVREGVQERK KGLESVKEST AAHQKLADEI IKLAAKENKT AGEKQNLKNK IDQLNGSIDG 54 0 

LNLAYDKNSN SLSHNADQIK SRISAMEAES TWQTAQQNLL NIEQKRSEVS KKLAENADLR 60 0 

KKWNE EANVS DSVRKEKIAE LTEEEAKLKN MQTQLQEEYN KTSATQQAAA DAMAAAEESG 660 

SARQVIAYEN MSEAQRTAID NMRTKYS ELL ETTTSIFDAI EQKTALSVDQ MNTNL EKNRA 720 

25 ATEQWATNLE ILAQRGVDQG ILEQLRRMGP EGATQTQVFV DATDAELAPL QENFRAATET 78 0 

AKNAMGSVLD SAGVEMPEKV KGMVTNVS T G LQAELQAANF AQLGQEIPNG VSQGISQGAG 840 

KASDASVKMG QEVKRSFQGE LGIHSPSRVF TEYGGHITDG LSNGVTNGTS KVMQTMQSLA 90 0 

QQMSQKGQQI VNDMRSKSNQ ITDAFSTMSG PMHSHGVNAM QGLANGIYAG SGAALAAAQS 960 

IAARITATIQ SALDIHSPSR VMRDEVGRFI PQGIAVGIDA DRKVTDS S MQ KLKESMTINA 1020 

30 TPEIASGFGG GVAGIANQTT NNSNNSFTLN VKVDESDGNS HEKYQRLFRE FSWYIQQQQG 1080 

RLGDVK 1086 
<212> Type t PRT 
<211> Length. : 10 86 

SequenceName : SEQ ID 380 

35 SequenceDescription r 

Sequence 



<213> Organi smName r Streptococcus pyogenes MGAS8232 
40 <400> PreSequenceString : 

MAKEPWEEKI VDDTXGTRTR KSRNAFXSTP WLTALLSVFF VIIVAILFIF FYTSNSGSNR 60 
QAETNGFYGA STHKKTRKAS NAKKTSSSST TTDTTPSSEE TLASSEGTGE TLTVLAGEGA 120 
ASIAARAGIS VEQLQALNPE HMTQGYWYAN PGDQVTIK 158 
<212> Type : PRT 
45 <211> Length : 158 

SequenceName : SEQ ID 3 81 
SequenceDescription : 

Sequence 
50 

<213> Organi smName : Streptococcus pyogenes MGAS8232 
<400> PreSequenceString : 

MSKRGKIKIT TKTKLITASV ITLVLIITGV VLWKQQQNTL TADIAKEPYS TVSVTEGSIA 60 
SSTLLSGTVK ALSEEYIYFD ANKGNDATVT VKIGDQVTQG QQLVQYNTTT AQSAYDTAVR 12 0 

55 SLNKIGRQIN HLKTYGVPAV STETNKDEAT GEETTTTVQP SAQQNANYKQ QLQDLNDAYA 18 0 

DAQAEVNKAQ IALNDTWIS SVSGTWEVN NDIDPSSKNS QTLVHVATEG QLQVKGTLTE 240 
YDLANVKVGQ SVKIKSKVYS NQEWTGKISY VSNYPTESNA GSTTPAGSTG AGS S TGAA YD 3 00 

YKIDIISPLN QLKQGFTVSV EWNEAKQAL VPLTAVIKKD KKHYVWTYDD ATGKAKKVEV 3 60 

TLGNADAQQQ E I HKGVAVGD IVIANPDKNI KPDKKLEGVI SIGTNTKPEK DSQSKNKKSG 420 

60 VDK '423 
<212> Type : PRT 
<211> Length : 423 

SequenceName : SEQ ID 3 82 
SequenceDescription : 

65 

Sequence 
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<213> OrganismName : Treponema pallidum 
<400> Pre Sequences t ring : 

MLRLPTARAC ITMGTMIRHT FTHRCGALLC ALALGSSTMA ATAAAKPKKG QMQKLRQRPV 60 
WAPTGGRYAS LDGAFTALAN DASFFEANPA GSAJSTMTHGEL AFFHTTGFGS FHAETLSYVG 120 
5 QSGNWGYGAS MRMFFPESGF DFSTTTEPVC TPASNPIKQR GAIGIINFAR RIGGLSLGAN 180 
LKAGFRDAQG LQHTSVSSDI GLQWVGNVAK SFTSEEPNLY X GLAATNL GL TVKVSDKIEN 240 
CTSTCEKCGC CKERCCCNGK KACCKDCDCN CPCQDCNDKG TVHATD TMLR AGFAYRPFSW 3 00 

FLFSLGATTS MNVQTLASSD AKSLYQNLAY SIGAMFDPFS FLSLSSSFRI NHKANMRVGV 3 60 

GAEARIARIK LNAGYRCDVS DISSGSGCTG AKASHYLSLG GAILLGRN 408 
10 <212> Type : PRT 

<211> Length : 408 

SequenceName : SEQ ID 383 

SequenceDescription : 

15 Sequence 



<213> OrganismName : Treponema pallidum 
<400> PreSequenceString : 

MSRTFRAWQC VGALCALSPL LPAYSSEGVR EVPPSQSPQV WAYEPIRPG DQLLKIGIVA 60 
20 GCQLYIAGGN GTNGSSSSGT NGNGNGKLLG GGGFHLGYEY FFTKNFSLGG QVSFECYRTT 120 

GSNYYFSVPI TVNPTYTFAV GRWRIPLSLG VGLNIQSYLS KKAPGLIAEA SAGLYYQYTP 18 0 

DWSIGGIVAY TQLGDIASSP DKCRAVGLAT IDKGVRYHF 219 

<212> Type : PRT 

<211> Length : 219 
25 SequenceName : SEQ ID 3 84 

SequenceDescription : 



Sequence 

30 

<213> OrganismName : Escherichia 
<40 0> PreSequenceString : 
atgataaatt taagtaagga agcaacggtg 
atgatgttgt cttttcctgt agcttctcaa 

35 gtatataacg ccaatggtgt gccagtcgtt 
tctcataata tctgggataa cctaaacgtt 
gctaatgaat ccagtacttc acttgccgga 
gggtcggcga aggtgatcct gaatgaggtt 
atgatggaag ttgcagggga taaagcggat 

40 gtaaacggtg gcggttcaat caatacaggt 
atccaggatg acaagctggc cggttactcc 
ctggataacg ccagcccgac agaaattctg 
tctgccgatg agctgaacgt tgttgctggc 
accggtagcg tatccgccac ggggtcccgt 

45 ggcggaatgt atgcgaacaa aatcagtctg 
aacctcggcg ttattgctgg gggtgttaat 
ttaaacagta acgcccagat tcagtctgca 
ctggataaca ccaccggtac ggtgacatct 
aatactatcg tgaatacccg tgcgggtaac 

50 agcggtacga ttgacaatac taacggcaag 
accaataacg ccacgctgat taactctggt 
ctcgtggcgc tgaaaaccgg aacgctcaac 
gtgggtcttg aatccgctgc gctgaataac 
atcgccatta tcagtaacgg taatgtggat 

55 gggcatatcg ttattggcgc ggcaggtagc 
accggcagtt ctgactctct gggcattatt 
aacatcaata acaacggcgg acagattgcg 
agcacgatcg acgactatgc gggcaaaatt 
agctctctgc gtaacgatac cggggggatc 

60 ggcggcagcc tgaccaataa tattggcgtg 
ttagccaact ccgtggataa ccacggcggc 
tcgatgtctg gcgtcaataa caacacagcg 
aatgcgcgcg gcagtatcga aaaccgcgat 
tacttcggca tgcctcagca aacgggtgga 

65 gggcagaaca tctataacaa caacagccgt 
caggcgcaga acacgttcga caacacgcgt 
attcaggttg gcggaacgta ttacaacaac 



coli 0157:H7 



ggcjaaagcat taacccctat tgctatactt 60 

gccjgcgggat tagtcataaa aaatggaacg 120 

gacatcaaca aacctaacgg tagcggttta 180 

gataaaaatg gtgtcgtttt caataatagc 24 0 

aatattcagg gaaacagtaa tctgacctcc 3 00 

acttccaaaa atccttcaac cattaatggg 3 60 

ctgattattg ccaacccgaa tggtattact 420 

aaacttacct taaccaccgg gacgccggat 480 

gtcjaacggcg gtaccattac gctcggtaaa 540 

tcccgtaacg tggtagttaa cggcaaagtg 600 

aataactatg ttaatgccgc aggccaggtg 660 

aacggttaca gcgtagatgt tgccaaactg 720 

gtcagcaccg agaaaggtgt gggggttcgc 780 

ggtgtcagca tcgattccaa aggtaacctg 840 

agcacgatca acctgacaac aaatggtact 900 

gtaggcacta tctcgcttaa taccaacaag 960 

atctctacga tgggcgatat ctacgttaac 1020 

cttgcggctg caggaatgct ggcggttgat 1080 

aaagggagtt ctgtcgggat tgaagcgggg 1140 

aacagcaatg gtcagattcg cggtggctat 12 00 

aacaacggtg atatccagac caccggcgat 1260 

aacaacaaag gtctgatccg ttcgtccacc 1320 

gtaaataatg gttcaaccaa aaccgccgat 13 80 

gcagataccg gcgtagaaat tggtgcgaac 1440 

tctaatggca acgtctccct gtcaagttac 1500 

ctgtccaaca gcaaagtgat tatcaaggga 1560 

agcggtaagc agggtattga agtcgccgtt 1620 

atcagctctg aagagggtga tatctccctg 1680 

ttcatgatgg ggcagaacat cacgatggag 1740 

ct<gatcgtgg ccagcaaaaa actgaagata 1800 

ggcaataact tcggtaatgc ttatggtctg 1860 

atcjgtcggca aggaaggcat cgagctttcc 1920 

cttatcgctg aggatggtcc tctgactctg 1980 

gctctggtca ccagcggggc ggatgcatct 2 040 

tacgctacca cctggagtgc gggcaacctg 2100 
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gatatcgacg cgaccacgct gcaaaacagc 
accgggttca tagcatctga taaaaacctg 
tacggctgga tcagcggtaa aggcgatgtt 
aaccgcaata ccattgcggc tgaaaagggg 
5 aactggaagg atatttctgc tggcggcgac 
aacaactcca acagcaatat ggtggggcag 
aacaaccgtg gcaacattgt cagtgacgct 
tataactatc tctatatggt agggtafcggg 
aacaataacg cgaccatcga agcgacaggc 

10 ggtaacaacc gcggtaatct gcatgcgttg 
ctgaacaacg ataacggtga aattcgfcggt 
aactacgaca gctataaggg ttcgctgacc 
aacattgtag acaacgccta tggtttgatt 
tcgacgattt acaacaacac tgcgqtcjatc 

15 ggcggcaacc tcgaaaaccg cgacggcgaat 
ggaattaccg acaacgttgg cggcatcgta 
aacgtctaca acaataacag cagcatcatc 
aggggaacgc tggataatac ccgcgcgctt 
gcggcaggga cgttctacaa caactatgcc 

20 tatgcggcgt cgttgaacaa cgccagogat 
gtgattgcgt ctgacaaaaa cctggatctg 
tggatcagcg gtaaaggaga tgtgcatttc 
aatgccatcg cggcggacaa cgcgctcgacc 
aaagacattg tggcgggtac tgcgctgact 

25 agcaacagta atatgttggg acaaaccatc 
cgtggaaata ttgtgggtga ttattctctg 
tacctcaata tgctgagtta tggtgtcgct 
ggtaaagacg ctgttctcgg tggcttctac 
aacaccggta ctattgtcgg catgtaa 

30 <212> Type : DNA 

<211> Length : 3807 

SeguenceName r SEQ ID 385 
SequenceDescription : 



agcagcggta cgatgatcga taacaatgcg 2160 

tcactggaag tggtgaatag ccttaccaac 2220 

gatgtcacgg tgaataacgg caacctgtat 2280 

ctggatattg ccgcgttgaa cggtattgaa 2340 

ctgacgatga acaccaatcg ccatgtgacc 2400 

aatattgtta ttaacgcggt taacgatatc 2460 

gacctgaacg tgacgaccaa aggcaacctg 2520 

gatatcgcat tgtcggcaaa tagcgtggcg 2 580 

gatctgatta tcgattcgaa gggtaacgtg 264 0 

aacggcgtgt tgtctgttaa aggcaacaat 2700 

tatggcgatg tcacgctggc actgacgggc 2760 

tctgaaacgg gcgacgtgac tctgacggcg 282 0 

gccggtgaga atgtttctgt cgatgctaaa 2 880 
gcggcgaaf^ aa^agctggt tattaacgct 394 0 

aacttcctgc gtaataacgg cgcgctgttt 3 0 00 

ggfcaaagaag gtgtcacgct ttctgctcag 3 060 

gctgaaaatg gtccgcttaa tcfcgctgtcc 312 0 

cttagcagtg gggctgatgc catcatccgt 318 0 

accacgtaca gcgccggtaa tctcgacgtt 3 240 

ggtcgcctgg aagacaatac cgccacgggc 33 0 0 

agcgttgata acagtgtcac taactatggt 3 360 

aatgttctga aaggcacgct gtataaccgt 342 0 

attaatgccc tgaacggtgt tgagaacttt 3480 

attgatacgc agaagtatgt taccaacaac 3540 

gcgatcaatg ccgtgaatga cattaataac 3 60 0 

ggtgttaaaa ccaccggtaa tatttataac 3660 

ggcgtatcgg caaataaggt tacgaatagc 372 0 

ggtttagcgt tagaagcaaa cgaaactgat 3780 

3 8 07 



35 Sequence 



<213> OrganisraMame : Escherichia 
<400> PreSequenceString : 
gtgaacacaa tacacttgcg ctgtctcttc 

40 gctgatgttg cagcaaagct aaggtccgctt 
atgaaattta tgaacaggac cagtccctat 
- atatctgcct tgatatatgc cccgcccggg 
gtggtaaacg atgagactgt agatgg-cagc 
aacactcata ttatcaacca tggccagcag 

45 cttattgaat ctggtggata tcaagatgta 
aataatacca ccattaacgg gggcagacag 
acgataatcg agagtggcaa tcaggacgtt 
attaagggcg gtgcttcacg cgtagaggga 
ggtagccaga tagtaaaagt tcaagggcat 

50 tctcaggacg tagtacaagg aagtctggca 
tatgttgaac agagcacagt agaaacaacc 
tatgagagcc gtgcgctgga cacgacgatt 
tcaacggcaa aaaatactca gatcta.ttct 
tcctcggatg ttattgaagt ttattccggt 

55 acaaatgtta cccagcacga tggtgcaatt 
agcggtacga atagtgaagg tgcattctcc 
ctggaaaacg gtggtcattt agacataaac 
aaagataaag gaacaatgtc agttttaacc 
aatggcgggg ttatggatgt tgcaggaaac 

60 cagaatatta ataattatgg catagccaca 
atcaaaagcg gcgggaaagc tgacacaaca 
gagaaagatg gtacggcaat tggcagcaat 
accggcggta ttgcacatgg ggttaaccag 
ggtgcaggga ctga'uatcga aggata.caac 

65 gaggctaatt atgttgtgct ggaaaatacc 
gcgaaaaata ctaccattga tgctggcggt 
gatagcacca gacttaataa tggcggcgtt 



COli 0157:H7 

aggatgaatc ccctggtctg gtgcctgtgg 60 

aaacgctact cagtattcac ttttcagagg 120 

tattgtcgtc gctcagtact ttccttattg 180 

atggctgcct tcactcctga tgttattggt 240 

caacgagtag atgaacgagg tacaacaaat 3 00 

aatgtttatg gcggggtatc taatggaagt 360 

ggaaggcata acaattatgt ggggcagtct 420 

tcaattcatg acgggggtat ttccacaggt 480 

tataaagggg gtatcagcaa tggaacgaca 540 

gggagtgcga atggaacact cattgatggt 60 0 

gctgatggta caacgataaa taagtctggc 660 

acgaacacaa ccataaatgg tggtcgacag 72 0 

accatcaaaa atggcggtga gcaaagagta 78 0 

gaaggcggaa ctcagtctct gaatagtaag 84 0 

ggtggtacgc aaattattga taacaccagc 90 0 

ggcgtgcttg atgttagtgg tggtacggca 960 

ttaaaaacta acactaacgg tacgacggtg 102 0 

atccacaatc acgtggcaga caatgtgttg 108 0 

gcatatggtt cggcaaacaa gacgattatt 114 0 

aatgctaaag ctgatgcgac ccgaatagat 12 0 0 

gcgacaaata ccataattaa tggtggcaca 12 60 

ggcaccaata tcaacagcgg aacgcaaaat 1320 

attatatcct ccgggagccg gcaggttgtt 13 80 

attagcgccg gaggctcgct gattgtctat 1440 

gagacgggca gtgctttagt tgccaacacg 15 00 

aagctctctc acttcactat taccggaggg 15 60 

ggcgaactga cggtagtggc taaaacctcg 1620 

aagctgattg tccagaagga ggctaaaaca 1680 

ctggaggttc aggacggtgg tgaggctaag 1740 
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55 



Sequence 



catgttgagc aacaatccgg cggcgcatta attgcttcca cgacctccgg aacacttatc 1800 

gaaggaacca acagttatgg tgatgctttc tacatcagga attcagaagc taaaaatgta I8 60 

gtgctggaaa acgctggctc attaacagtc gtcactggtt cccgggcagt tgacacgatt 1920 

attaatgcca acggcaaaat ggatgtttat ggaaaagatg ttggcactgt actcaatagt 19 80 

5 gctggcaccc aaacaatata tgccagtgcc acttctgata aagcaaatat caaaggtggc 2040 

aagcaaacgg tatatggttt agccactgaa gcaaatatcg aaagtggtga acaaattgtt 2100 

gatggtgggt caacagagaa aacacacatc aatggtggca cgcaaaccgt tcagaattat 2160 

ggtaaggcga tcaataccga tatcgtctct ggcctacaac aaattatggc aaacgggaca 2220 

gcggaaggtt ccattattaa tggcggttca cagatagtta atgagggcgg tctggctgaa 22 8 0 

10 aactcggtgc ttaatgatgg cggcacactc gatgtgcggg agaaaggcag cgcaacgggg 23 40 

atacagcaga gtagccaggg cgcgttggtt gcaaccacca gggcgacgcg ggtcacagga 240 0 

acacgcgcgg atggcgtcgc gttcagcatc gagcagggtg cggcgaacaa tatcctgctg 2460 

gcaaatggcg gagtgttaac cgtggagtca gacacctctt ctgacaaaac acaggtcaat 252 0 

acgggcggac gggagatcgt caaaacaaaa gccactgcga cr.ggcacgac- gctcaccggc 2580 

15 ggtgaacaaa ttgtcgaggg tgtggcgaat gagacaacaa ttaacgacgg cggaatacaa 2640 

acagtttcag ctaacggaga ggcaataaaa acaacgatca atgaaggcgg tacgctgaca 2700 

gtcaacgata atggcaaagc gacagatatc gtccagaaca gcggtgccgc tctccagacg 27 60 

agcacggcta acggtattga aatcagcggt actcaccagt acggcacttt ttccatttcc 2820 

ggcaatttag cgaccaatat gttgctggaa aatggcggta atttattggt attagcaggt 28 80 

20 accgaagctc gcgactccac ggttggcaag gggggggcaa tgcaaaacca gggtcaggac 2940 

tccgccacaa aggttaactc tggtgggcaa tatacccttg ggcggtcaaa agatgagttt 3 0 00 

caggctctgg cccgggcaga agatctccag gttgctggcg ggacagcaat cgtctacgca 3 060 

ggtacgctgg cggatgcatc ggtcagtggc gcgacaggaa gcctgtcgtt aatgacgcca 3120 

cgggataatg ttacgccagt taaactcgaa ggggcgatcc ggattaccga tagcgcgaca 318 0 

25 ttaactatcg gcaatggcgt tgatacgacg cttgccgacc tgacggctgc cagccggggc 3240 

agtgtctggc ttaacagcaa taattcctgt gcaggcacca gcaactgcga gtatagagta 3300 

aacagtttgc tacttaacga cggtaatgtt tatttatcag cacaaacagc agcgcctgcc 33 60 

acaactaacg gtatatacaa tacgctgaca accaatgaac tttccggtag cggtaatttc 342 0 

tacctgcata ccaacgttgc aggctctcgg ggcgatcaac tggtcgtcaa caacaacgcc 348 0 

30 actggtaatt ttaaaatctt tgttcaggat accggcgtca gtcctcagtc tgacgacgcg 354 0 

atgacgctgg tgaaaacagg gggaggggat gcttcgtttt cgctgggcaa tactggcggt 3600 

ttcgttgatc ttgggaccta tgagtatgtc ctgaaaagcg atggcaacag caactggaac 3660 

ctgaccaatg atgtcaaacc caacccggat cccaacccaa atcccaaccc aaatccgaag 3720 

ccggatccaa aaccagaccc aaaaccggat ccgaaaccag acccgactcc cgagccaacg 3780 

35 ccgacacccg ttccggagaa acgcatcacg ccttctaccg cagccgtact caatatggca 3840 

gcaacattac cgttggtatt tgatgctgag ctaaacagta ttcgcgagcg gttgaacata 3900 

atgaaagcga gtccacacaa caataatgtc tggggggcga cgtataacac ccgtaataat 3960 

gtcaccaccg afcgcgggggc cgggtttgag cagacgctga ccggaatgac agtggggatc 4020 

gacagcccta atgatattcc tgaggggatt gcgacgctgg gcgcttttat gggttattcc 4080 

40 cattcacata tcggttttga tcgcggagga catggcagtg: tgggcagtta ttctctgggc 4140 

ggctatgcca gttgggaaca tgaaagtggt ttctatctgg acggtgtcgt gaagctgaac 42 00 

cgttttgaaa gtaacgtagc cggtaaaatg agcagcggtg gagccgccaa tggcagttac 4260 

cacagcaacg ggctgggcgg tcacattgaa accgggatgc gatttaccga tggtaactgg 43 20 

aacctgacgc cgtatgcatc gttaacgggg ttcaccgctg ataaccccga atatcattta 43 80 

45 tccaatggca tggaatcgaa atcagtcgat acccgcagta tatatcgtga actgggcgca 4440 

acgctgagtt acaacatgcg tctggggaac ggtatggaaa ttgagccgtg gctgaaggcg 4500 

gctgtgcgca aagaatttgt cgatgataac cgggtgaagg tgaataatga cggtaatttc 4560 

gtcaatgatt tgtcgggcag acgtggaata taccaggcag gtattaaagc ctcattcagc 4620 

agtacgttaa gcgggcatct tggggtgggg tatagccatg gtgccggtgt ggaatccccg 4680 

50 tggaacgcgg tagctggtgt gaactggtcg ttctga 4716 
<212> Type : DNA 
<211> Length : 4716 

SequenceName : SEQ ID 3 86 
SequenceDe script ion : 



60 
120 



<213> OrganismName : Escherichia coli 0157 =H7 
<4 00> PreSequenceString : 

60 atggcagtaa agatttcagg tgtactgaaa gacggcacag gaaaaccggt agagaactgc 
accattcaac tgaaagccag acgtaacagc gccacggtgg tggtgaacac ggtggcctct 
gaaaatccgg atgaagccgg tcgttacagc atggacgttg agtacggtca gtacagcgtt 180 
attctgttgg tggaagggtt cccgccgtca catgccggga. ccatcaccgt gtatgaagat 240 
tctcaacccg gtacgctgaa tgattttctt ggtgccatga ctgaggatga tgtccgtccg 300 

65 gaggcactgc gccgttttga gctgatggtg gaagaggtgg cgcgtaacgc gtccgcggtg 3 60 

gcacagaaca cggcagccgc gaagaagtca gccagcgatgj ccagcacatc agcccgtgag 420 
gcggcaaccc atgcgactga tgctgcggac tcagcacgcg cagccagcac gtcagccgga 480 
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caggccgcgt cgtcggctca gtcagcgtct tccagcgcag gaacggcatc aacaaaggct 540 

actgaagcat caaaaagtgc tgccgctgca gagtcctcaa aaagcgcggc ggctaccagt 600 

gccggtgcgg cgaaaacgtc agaaacgaat gcggcagtgt cacaacaatc agccgccact 660 

tctgcatcca ccgcgaccac gaaagcgtca gaagctgcct cctcagccag ggatgcgtcg 720 

5 gcttcaaaag aggcggcaaa atcatcagaa acgagcgcag cctcgacgcgc cagtagtgca 78 0 

gcctcctcgg caacggcggc aggcaattcc gcgaaggcgg ccaaaacgtc tgagacaaac 840 

gctaagtcct ctgaaacggc agcagaacag agtgcctccg cagcagcagg ctcaaaaaca 90 0 

gcggctgcat tatctgccag tgccgcgtca acaagtgccg ggcaggcctc agccagtgcc 960 

accgccgccg gaaaatcggc agaaagtgcc gcatcgtctg cttcaacagc cacaacgaag 102 0 

10 gctggcgaag ccactgaaca ggccagcgca gcagcgagtt ctgcttccgc agcgaagaca 10 8 0 

tccgaaacga acgcgaaagc gtcggaaacc agcgcagaat cctcaaaaac ggctgccgca 1140 

tcgtcagcca gttcggcggc gtcatcggca tcatctgcgt ctgcttcaaa agatgaggcg 12 00 

accagacaag cgtcagcagc gaagagcagc gccacgacgg catccacgaa ggcgacagag 1260 

gcagctggta gtgcgacggc agcagctcag agcaaaagta cggcgga.atc tgcagcaacg .J3 20 

15 cgcgctgaga cagcggcaaa acgggcagag gatattgcat ccgccgtggc gcttgaggat 13 8 0 

gcgagcacga cgaaaaaggg gatagtacag ctcagcagtg cgactaacag cacttccgag 1440 

tcactggcgg caacgccaaa agccgttaag gccgcgtatg agctggctaa cgggaaatac 150 0 

accgcacagg atgcaacgac agcacagaaa gggatagttc agcttagcaa cgcgaccaac 1560 

agcacatctg aaatgctggc ggcaacgcca aagtcggtaa aggcagccta tgaccttgct 162 0 

20 aacgggaaat atactgctca ggacgctacg acagcacaaa aaggaattgt ccagctcagt 1680 

agtgcaacca acagcgcatc tgaaacgctt gccgcgacac cgaaagcagt gaaagcagct 1740 

aatgataatg cgaatggtcg ggtaccttct gcccgtaagg tgaatg-gtaa ggcgctttca 18 0 0 

tcggatataa cactgacgcc gaaagatatt ggtacgctta actcaacaac aatgtcattc 1860 

agcggtggtg ctggttggtt caaattagca acggtaacca tgccacaggc gagttctgtt 192 0 

25 gtttcaatta cgttgattgg tggcgcggga tttaacgtgg ggtcacctca acaggcaggt 198 0 

atatctgaac ttgttttgcg tgcaggtaat ggtaatccga aggggattac tggtgcttta 2040 

tggcagcgca catcgacagg gtttacaaat tttgcctggg tcaatacatc tggtgatact 210 0 

tacgatattt acgfctgcaat cggaaattat gcgactggtg taaatattca atgggattat 2160 

accagtaatg ccagcgtgac gattcatacg tcaccagcat attctg-ctaa taagccggaa 2220 

30 gggttaacgg acggtacagt ttattcactc tatacgccat cagagca.gtt ttatccgcct 22 80 

ggcgcaccaa tcccgtggcc atcagatacc gttccgtctg gctatgccct gatgcagggg 234 0 

cagacttttg acaaatctgc atacccgaaa cttgcagccg cttatcrcgtc aggcgtgatc 2400 

cctgatatgc gtggctggac gattaagggc aaacctgcca gtggtcgggc cgtattgtcfc 2460 

caggaacagg acggcattaa atcgcacacc cacagcgcca gcgcatccag fcacggatttg 2520 

35 gggacgaaaa ccacatcgtc gtttgattac ggcactaaat ccacgaia-taa caccggggcg 2580 

cacacgcaca gtgtgagcgg tacagccgca agtgccggaa accatactca tagfcgtcaca 2640 

ggcgcatcag cagtcagcca gtggtcacaa aatgggtcag tacataaggt agtgtctgcg 2700 

gccagtgtga atacaagtgc tgcaggagcg cacactcata gtgfcca.cjcgg cacagctgca 2760 

tctgcaggtg ctcacgcaca tactgtcggt attggtgctc atacgcactc tgttgcgatt 282 0 

40 ggctcacatg gacacaccat caccgttaac gctgcgggta acgcggaaaa cactgtcaaa 28 8 0 

aacatcgcat ttaactacat tgtgaggctt gcataa 2916 
<212> Type : DNA 
<211> Length : 2916 

SequenceName : SEQ ID 387 

45 SequenceDescription : 

Sequence 



<213> OrganisniName : Escherichia 

50 <400> PreSequenceString : 

atgaaacgag ttattaccct gtttgctgta 
tcattcgcct gtaaaaccgc caatggtacc 
gtttatgtaa accttgcgcc tgccgtgaat 
acgcaaatct tttgccataa cgattatccg 

55 cgaggctcgg cttatggcgg cgtgttatct 
agtagctatc catttccgac caccagcgaa 
gataagccgt ggccggtggc gctttatttg 
attaaagctg gctcattaat tgccgtgctt 
gatgatttcc agtttgtgtg gaatatttac 

60 ggctgcgatg tttctgctcg tgatgtcacc 
ccaattcctc ttaccgttta ttgtgcgaaa 
acaaccgcag atgcgggcaa ctcgattttc 
ggcgtcggcg tacagttgac gcgcaacggt 
ttaggagcag taggaacttc ggcggtaagt 

65 ggcgggcagg tgactgcagg gaatgtgcaa 
taa 

<212> Type : DNA 



COli 0157 :H7 



ctgctgatgg gctggtcggt aaatgcctgg 60 

gctatcccta ttggccggtgg cagcgctaat 12 0 

gtggggcaaa acctggtcgt agatctttcg 18 0 

gaaaccatta cagactatgt cacactgcaa 240 

aatttttccg ggacccjtaaa atatagtggc 3 00 

acgccgcggg ttgtttataa ttcgagaacg 3 60 

acgcctgtga gcagtgcggg cggggtggcg 42 0 

attttgcgac agaccaaaaa ctataacagc 48 0 

gccaataatg atgtggtagt gcctactggc 540 

gttactctgc cggactaccc tggttcagtg 600 

agccaaaacc tggggtatta cctctccggc 660 

accaataccg cgtcgttttc accagcgcag 72 0 

acgattattc cagcgaataa cacggtatcg 780 

ctgggattaa cggcaaatta cgcacgtacc 840 

tcgattattg gcgtgacttt tgtttatcaa 900 

903 
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<211> Length : 903 

SequenceName : SEQ ID 3 88 
SequenceDe script ion : 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<4 00> PreSequenceString : 

atgggctacg ttacaggtgg attaccaatg aagaataacc gtgcgtgggc gcttatcagt 60 

10 ggtctgatat tgttcagcgg aacggcccca gctgccgata acctgcattt taccggtaat 120 

ttgcttggta aatcctgtac tcctgtaatc aatggcaact tacttgcaga aattcatttc 18 0 

cccacaattg ctgccagcga tttaatgcaa cgtggtcagt cagatcgcgt accgttagtt 240 

tttcagttga aagattgcaa aagcaccacg gcgtttaatg tcaaggtgac cttgatggga 3 00 

acagaagata ccgacttacc aggatttctg tcgattgatt cgtcatcttc tgcaacgggt 3 6£> 

15 gttgggattg gcattgaaac tgccggaggg gcggctgtac ctattaacag taccacaggt 42 0 

gcctcatttc cattaaatca gggaaataac agtgtcaatt ttaatgcctg gttacagacc 480 

gtaaatggac gaaatgttac atcgggtgat ttcaccgcca caatgacggt aacttttgag 540 
tatttttaa 549 
<212> Type : DNA 
20 <211> Length : 549 

SequenceName : SEQ ID 389 
SequenceDescription : 



Sequence 

25 

<213> OrganismName : Escherichia 
<400> PreSequenceString : 
atgaaacgac atctgaacac cagctacagg 
gtggtggcct ccgaactggc ccgctcacgg 

30 tctcttgctg ctgtcacatc agtcccggca 
gaaaccgtga acgatggaac actgacaaat 
aacggaatga ccatcagtac cgggctggaa 
gggcaatgga tacagaatgg cgggatagcc 
caggtcgtgc tggagggggg aacagccagt 

35 agcctgaacg gactggcggt gaacaccaca 
gagggcgggg ttgccaccgg tacaattatc 
ggcgggctgg caacaggaac catcatcaac 
aactcgtata cgggtcagaa ggtccaggga 
ggacggcaga ttatcttatt ttccgggcta 

40 gaccagtcgg tacacggaag ggccctgaat 
cacagggacg gacttgcgct gaacacggta 
gcaggtggcg ctgccggtaa caccaccata 
ggcggggaag ccactgcagt cacccagaac 
gcaactgtca tcggcacaaa ccgtctgggg 

45 ggtgttgttc tggaatccgg cggtcgtctg 
accctagtgg atgacggcgg taccctggca 
accataacat ccggtggtgc cctgattgca 
gccagcggta agttcagtat tgatggcaca 
aatggcggca gctttacggt taatgccggg 

50 cgtggaacac tgacgctggc tgccggggga 
ggcgccagta tggtactgaa tggtgatgtg 
gagattcgct ttgataatca gacgacaccg 
agtaactccc cggtaacgtt ccataaactg 
accatcaata tgcgtgttcg ccttgatggc 

55 ggtggtcagg caaccggcaa aacctggctt 
ggggtggcaa ccaccggaca gggtatccgg 
gaagaaggtg cgtttgccct gagtcgcccg 
aaccgtgaca gcgatgaaga ctggtacctg 
cccctgtata catccatgtt gacacaggca 

60 cgcagccatc agaccggtgt aaacggtgaa 
ggtcatctcg gtcacgataa caacggcggt 
ggcagctatg gcttcgtccg tctggagggt 
tctctgacga caggggtgta tggtgctgca 
gacggttccc gcgccggcac ggtccgggat 

65 ctggtacaca catcctccgg cctgtgggct 
atgaaagcgt catcggacaa taacgacttc 
ctggaaaccg gtctgccctt cagtatcact 



coli 0157 :H7 



ctggtatgga atcacattac gggcaccctg 6 0 

ggaaaacgcg ccggtgtggc ggttgcgctg 12 0 

ctggctgctg acaaggttgt acaggcggga 18 0 

catgacaacc agattgtctt cggtacggcc 240 

ctggggccgg acagtgaaga aaacaccggt 3 00 

ggaaacacca ctgtcaccac aaatggtcgt 360 

gatacggtta ttcgtgacgg cgggggacag 420 

ctgaataaca gaggcgagca gtgggtgcat 48 0 

aaccgcgacg gttaccagag cgttaaaagt 540 

accggcgcag aaggcggccc tgattctgac 600 

acagcagaat ccaccaccat caacaaaaat 660 

gcccgtgaca ctctcattta cgcaggtggt 720 

accacactga atggcggtta ccaatatgtg 780 

attaacgagg- ggggctggca ggttgttaag 84 0 

aatcagaacg gtgaactgag ggtacatgcc 90 0 

acgggcggtg- cactggttac cagtactgct 960 

aatttcacgg tggaaaacgg taaggctgac 102 0 

gatgtactgg agagccattc agcacagaat 1080 

gtgtctgcccj gcggtaaggc gacaagtgtc 1140 

gacagtggtg ccactgttga ggggaccaat 12 00 

tccggtcagg ccagcggcct gctgctggaa 1260 

ggacaggctg gcaacaccac tgtcggacat 132 0 

agtctgagtg gcagaacaca gctcagtaaa 13 80 

gtcagtaccg gcgatattgt taacgcaggg 1440 

aatgccgcgc tgagccgtgc tgttgcaaaa 15 00 

accaccacga acctcaccgg ccagggcggc 1560 

agcaatgcct ctgaccagct ggtgattaat 1620 

gcgtttacaa atgtcggaaa cagcaacctc 1680 

gttgtggatg cacagaatgg cgccaccaca 174 0 

cttcaggcccg gcgcctttaa ctacaccctg 1800 

cgcagtgaaa atgcttatcg tgctgaagtc 18 60 

atggactatg accggattct ggcaggctcc 192 0 

aataacagcg tccgtctcag cattcagggc 19 80 

attgcccgtcj gagccacgcc ggaaagcagc 2040 

gacctgctca gaacagaggt tgccggtatg 2100 

ggccattctt ccgttgatgt taaggatgat 2160 

gatgccggca gtctgggcgg atacctgaat 2220 

gacattgtgg cccagggaac ccgtcacagc 22 80 

cgcgcccggg gctggggctg gctgggctca 2340 

gacaatctga tgctggagcc acaactgcag 2400 
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tacacctggc agggactctc cctggatgac 
gggcatggca gtgcacaaca tgtgcgtgcc 
acctttggtg aaggcacctc atcccgtgac 
agtgaactgc cggtgaactg gtgggtacag 
5 ggtgacatga gcatggggac agccgcagcc 
aatggcacgt cactggacct gcaggccgga 
ctgggcgttc aggccggtta tgcccacagc 
ggtcaggcta cgctgaatat gactttctga 
<212> Type : DNA 
10 <211> Length : 2850 

SequenceName : SEQ ID 3 90 
SequenceDescription : 

Sequence 
15 

<213> OrganistnName : Escherichia 
<40 0> PreSequenc eSt ring : 
atgaaaaaat ggcattatat attttgcata 
tatgcggcaa atgatggcac gtgtgcaaca 

20 tttcctctga caacggtcag tgcagcaaac 
gctaatgcaa catcttctga aaattatagc 
aatggcgctt atcacgaaat atattatacc 
accaccgcaa gtggtcttgc tttttactat 
atatctgtgc taaatgcggg gtatacggca 

25 actacaacag atcacacttg tcagggaaac 
actggagcag atgcgaagat ttcatttcgt 
atacctatca ccgatattgc attgctgtat 
gaggcgattg caaaagttcg aatttcaggc 
aatgcaggac aggtgattta ttttgatttt 

30 accgccgggc aagccattac ttcacgaaaa 
gggatggggt atgagcgtac gcagaaagtc 
agtgacgata cgatggtggc gacagacaat 
tcgaatgctg aagttagcgt caacaacggc 
atttttggtc gtaaaaatgg ttcggtaact 

35 gcccggcctc agcccggcgt ttttaacgct 
taa 

<212> Type : DNA 
<211> Length : 1083 

SequenceName : SEQ ID 3 91 
40 SequenceDescription : 

Sequence 



<213> OrganistnName : Escherichia 

45 <400> PreSequenceString : 

atgtcacgtt ataaaacagg tcataaacaa 
tgcgtggcgt gggcaaatat ctctgttcag 
ccagtaatgg cggcacgtgc gcagcatgcg 
acggtaactg ctgataataa cgtggagaaa 

50 acatttttaa gcagtcagcc agatagcgat 
accgctaaag ctaaccagga aatacaggag 
aaactgaatg tcgataaaga tttctcgctg 
atttatgata cgccgacaaa tatgttgttc 
cgtactcagt caaatattgg ttttggctgg 

55 ggggtgaaca cctttatcga ccatgattta 
gcggaatact ggcgcgatta tctgaaactg 
tggaaaaaat cgccggatat tgaggattat 
cgcgcagagg gctatttacc tgcctggccg 
tattatggcg atgaagtcgg gctgtttggt 

60 atttctgccg aggtgaccta tacgccagtg 
cagggcaaga gcggtgagaa tgacactcgc 
gaacctttgg cgaaacaact cgatacggat 
agccgctatg acctggttga gcgtaataac 
gtgatccgta ttgctctgcc tgagcgtatt 

65 gggcttgtgg tcagcaaagc aactcacgga 
ttactggctg aaggtggcaa aattaccggt 
gcttatcgtc caggcaaaga caattattat 



119/341 



ggccaggata acgccggtta tgtgaagttc 2460 

ggtttccgtc tgggcagcca caacgatatg 252 0 

accctgcgcg acagtgcaaa acacagtgtg 2580 

ccttctgtta tccgcacctt cag-ctcccgg 2640 

ggcagtaaca tgacgttctc accgtcccgg 27 0 0 

ctggaagccc gtatccggga aaatatcacc 2760 

gtcagcggca gcagcgctga agg-otataac 282 0 

2 8 50 



coli 0157 :H7 



attctctttc atttagggtt accgtgcggg 60 

agaggcggca cacatacatt aagccttaat 120 

aatgtgcctg gaaatacatt aatagatatt 18 0 

gttctgtgta actgtgattc aaaacatagc 240 

gcagaccctg ctcccggtat ggtttatagc 3 00 

cttaacgaat atgtcgatgt gggjaacaaaa 360 

gttccttttg aacatgtttc caaccaggca 420 

aaaactacag cggttggcgt gagcctgaaa 480 

attaaacgtt caataaatgg aacggtagta 540 

gccaacatat ccagcaccac gacccgtggt 60 0 

agtttgaccg caccacagtc ttgjtcagata 660 

gatactattc ctgcgtccga attttcatct 72 0 

atcactaaaa cagtgagtat tgagtgtacg 78 0 

gatgcttctt ttacggggac gaaccgaagc 840 

gctgatgtcg ggatcaaaat ttacaataaa 900 

aagttacccg cagacatggg caacacgacc 960 

ttttcggcag cacctgccag ctfcfcaccggt 102 0 

accgcgacct taaccattga atttgtaaac 108 0 

10 83 



coli 0157 :H7 



ccacgatttc gttattcagt tctggcccgc 60 

gttctttttc cactcgctgt cacctttacc 12 0 

gttcagccac ggttgagcat gggaaatact 18 0 

aatgtcgcgt cgtttgccgc aaatgccggg 240 

gcgacacgta attttattac cggaatggcc 300 

tggctcggga aatatggtac agcgcgcgtc 3 60 

aaggattctt cgctggaaat gctttatccg 420 

actcaggggg caatacatcg tacagacgat 4 80 
cgtcattttt caggaaatga ctggatggcg 540 
tcccgtagtc atacccgcat tggtgttggt 600 
agcgccaatg gttatattcg ggcttctggc 660 
caggaacgcc cggcgaatgg ttgggatatc 72 0 

cagcttggcg caagcctgat gta-tgaacag 780 
aaagataagc gccagaaaga cccgcatgct 840 
cctcttctga cactgagcgc cgggcataag 900 
tttggcctgg aagttaacta ccgaattggc 960 

agcattcgcg agcgtcgggt actggcaggc 1020 

aacatcgttc ttgagtaccg caaatctgaa 10 80 

gaaggtaagg gcggtcagac actttccctg 114 0 

ctgaaaaatg tgcagtggga agcgccgtca 12 00 

cagggtagtc agtggcaagt aacgctcccg 12 60 

gcgatttcag cagttgccta cgataacaaa 13 20 
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ggcaatacct caaaacgcgt gcagacagag 
gatcgcacgg cgttaacgct tgacggtcag 
gagcaaaaac cgctggtgct gtctctgcgc 
aaagatcaga tcaagactga actaactttc 
5 ctgaaggcca ctaaatcaca ggcaaagcca 
ggggtgtatc agtctgtctt tactaccgga 
agcgttgatg gcatgagcaa aaccgtcact 
gcaaactcca ccctgagcgc taacgagccg 
gcctatacgt tgacgttgac tgcggtggac 

10 agccgcttgc gatttgttcc gcaagacact 
ataaaaccag gcgtttacag cgccgcggtt 
cgtgctttca gcgagcagta tcagctgggc 
gggccgcttg atgcagcaca ttcgtccatc 
gggacagtta gggcaatctg gacggtaaaa 

15 acgccggaag cgccgtcatt agcgggtgcc 
acaaataatg gtgatgggac gtggactgcg 
ttagaagtta tgccgaagct aaatggacag 
gtggtggctg atgcgttatc ttcaaaccag 
aaagccggcg aaagcacaac cgtgacgctg 

20 agtggtcttg cgttgtcggc aagtttgacg 
agttggaccg aaaaaggtaa cggttcctat 
ggcgagcttc gcgtcatgcc tctcttcaac 
ttgacggtca tcgccggaga gatgtcatca 
gctccgaccg tcaaaacgac gacggaactc 

25 ccggtcaccg ggctgaagcc agatgcacca 
gagcgtcctt cagcaggaaa ctggacagag 
acgctgggat ctgccgcggg tcagttgtct 
gttgctcagc cactggtgct gaacgttgca 
atgacagtga aggttaataa ccaactggct 

30 accgttgtgg acacctatgg taacccgttg 
cagggtgtga ccagcaagac ggggaataca 
attgagctta tgtcaacggt tgcgggagaa 
cagaagacgg tcacggtgaa attcaacgcg 
gtagacgccg ctgctcaaaa agtggcaaac 

35 gttgaggata aaaatggtaa ccctgttcca 
ggtgtcaagc cgcttacagg cgataatgtc 
gagttgcagg tggtttcagt gactgccgga 
agccagcctt cgaatacgca gactataacg 
tccggtattg aggtgattgg caactatgca 

40 aaagttacgg tgactgatgc caataacaac 
gccagcccgg caaatttagt tctgactccc 
caggctattt tcaccgccac gaccactgtc 
agtcaggccg acggtcagga atcgacgaaa 
acaaatgcag tactcaccgc atcatctgat 

45 actgcgaagc tggaggtgac actgatgtcg 
gtcgacatta agacgccaga aggggtgacg 
aatgaccatt tcgtgagcgg aaaaatcacg 
tatacgttca catttaacgc cctgacgtat 
accattaccg cggtggatgc cgatacggca 

50 

<212> Type : DNA 

<211> Length : 4254 

SequenceName : SEQ ID 3 92 
SequenceDescription : 

55 

Sequence 



<213> OrganismName : Escherichia 
<400> PreSequenceString : 

60 atggcgcgtg gttgggcgtc ttcagaagcc 
tttggtactg cgagaatctc tctgggtgtg 
ttcgacttcc tgcatccgtg gtatgacaca 
cttcaccgaa cagacgatcg tacccagatc 
tccagctgga tgtcaggcat caaccttttt 

65 cgcgcagggc ttggcgcaga atactggcgt 
atcggcctga ccggctggcg tagcgcacca 
gccaacggct gggatttacg cgcggaaggc 



gtggtcatta ccggagctgg tatgagcgcc 138 0 

agccgtattc aaatgcttgc taacggtaat 1440 

gacgccgagg gccagccagt cacgggcatg 1500 

aaaccggctg gaaatattgt gactcgttcc 1560 

acactgggtg agttcaccga aactgaagca 1620 

acgcagtcag gtgaggcaac gattactgtt 1680 

gcagaactgc gggccacgat gatggatgtg 1740 

tcaggtgacg tggttgctga tggtcagcaa 18 00 

tccgagggta atccggtgac gggagaagcc 1860 

aatggtgtaa ccgttggtgc catttcggaa 192 0 

tcttcgaccc gtgccggaaa cgttgttgtg 1980 

acattacaac aaacgctgaa gtttgttgcc 2040 

accctgaatc ctgataaacc ggtggttggg 2100 

gatgcctatg acaaccctgt gaccagcctc 2160,„ 

gctgctgaag gttctacggc atcgggctgg 2220 

cagattactc tcggctctac ggcgggtgaa 228 0 

aatgcggcag caaatgcggc aaaagtaacc 2340 

tcgaaagtct ctgtcgcaga agatcacgta 240 0 

gtggcgaaag atgcgcatgg caacgctatc 2460 

gggaccgcct ctgaaggggc gac cgttt cc 2520 

gttgctacgt tgactacagg tggaaagacg 2580 

ggccagccag cagccaccga agccgcgcag 2 640 

gcgaactcta cgcttgttgc ggacaataag 27 00 

accttcaccg tgaaggatgc gtacgggaac 2760 

gtgtttagcg gtgccgccag cacggggagt 282 0 

aaaggtaatg gggtctacgfc gtcgacctta 2880 

gtgatgccgc gagtgaacgg ccaaaatgcc 2940 

ggtgacgcat ctaaggctga gattcgtgat 3 000 

aatggacagt ctgctaacca. gataaccctg 3 060 

caggggcagg aagttacgct gactttaccg 3120 

gtaacaacta atgcggcagg- taaagcggac 3180 

cacaatattt ccgcttcggt gaatggtgct 3240 

gatgccagca ccggtcaggc aaacctgcag 33 0 0 

ggcaaagatg cctttacgct gacggcgaac 3360 

gggagcctgg tgacctttaa tctgccccgg 3420 

tgggtgaaag ccaacgatga ggggaaagca 34 8 0 

acgtatgaga tcacggcatc ggcagggaat 3 540 

tttgtagccg ataaggctac cgcaaccgtc 3600 

ctggcggacg gcaatgccaa acagacgtat 3660 

ctgttgaaag atagcgaagt gacgctgact 3720 

aatgggacgg cgaaaactaa tgagcaagga 378 0 

gcagcgaaat atacactcac ggcgaaagtg 3 840 

actgccgaat ctaaattcgt cgcggatgat 3900 

gtgacttctc tggtggcgga. tgggatatcg 3960 

gcaaataacc ccgttggggg gaatatgtgg 402 0 

gagaaggatt atcagttcct gccgtcgaaa 408 0 

cgtacattta gtaccagcaa. gcctggtgtc 4140 

ggcgggtacg aaatgaagcc agtgacggtg 42 0 0 

aa-999 c 9 a 99 aggcgatgaa. ctaa 42 54 



COli 0157 :H7 

tcaggcgcga tgactgattg gttaaataac 60 

gatgaagatt ttagcctgaa. aaattcgcaa 120 

cctgattatc tgctcttcag ccagcatacc 180 

aacaccggtt tgggctggccj tcatttcacc 240 

tttgaccacg acctgagccg ctatcactcc 3 00 

gattatctga agttgagcag caacgcttat 3 60 

gaattggata acgacttcga agcccgcccg 420 

tggttacctg cctggccaca actgggggga 480 
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aaactggtct atgaacaata ctatggcgat 
caaagtaacc cccatgctat taccgcaggc 
ctcagtgcgg aacagcgtca ggggaagcaa 
ctgacctggc aacccagcag ttcaatgcag 
5 cggcgcagtc tggccggtag tcgttatgac 
gaataccgca agaaagagct gattcgcctg 
ggagaaataa aaccgctggt ttcctcgcta 
atcgaagccg ctgcgctgga agctgccgga 
acggtcacgc tgccaggtta ccgcttcact 

10 atagacgtta ccgccgagga tgtaaaaggt 
gttattcagg ctccgacatt aagccagaaa 
gtggctgcag ataaaaaatc gacgaccaca 
actccggtgc cggggctggc gctgcaaacc 
tctgactgga .cagataacgg tgatggtagt 

15 tcaggttcag taacactgac gccgcaaatt 
gtcgttaata tcgtccctgt tgtctcatcc 
gtatcgtatt atgccggaga cgacatcaag 
caaccggttg catatcaaaa agaggaattg 
cctggcgcca cgattgtctg gcacgaagag 

20 gcctataagc aagggactgc actaagggca 
ctgcaatcgc atatttataa cattgaggca 
tcagcgacaa ataatgacgt ttacgccgat 
gtcactgatg agagtgataa tcccctgaca 
ggaagcgcgg agtttgtcga accgccgcag 

25 ataaacatgg taagtcaggt tgcggaagaa 
ttttcacaac ggataattgc gaaattcgtt 
ctggttgccg atccagatac cattattgct 
atcatcacag actttcataa caacccgtta 
ggtggctcgc aactggacaa cacgaccgcc 

30 cacctgacca gttcaaaagc tggtagctat 
aatattcacc agtcggtcac gatcaccgtg 
ttgaafcgccg ggtcgggcag tgcgatcgct 
agtgtgaaag atgtttatgg acacccgttg 
gcctccatga ccgggaactt cacgctaagt 

35 gatgccgtgg tcacattgcg aggcacaaaa 
accagaaata ataccgttgc ttatcagcaa 
cagctccagc cgctgactgc ctcattaaat 
accctgacgg caacgatcct ggacgcttac 
fctccagagta acgatgtcac tctaagcgaa 

40 gcgacggtaa caatgaccag caatattgcc 
gcgcaagctt ccgataataa aacgtttagt 
aaggtaataa gtataaccgg agccgaaaaa 
cggatactcg tccaggacgc gtttaacaat 
gcgcagccaa caactaacat tacgataggc 

45 gcgtacgtta accttctcag cacccaacct 
aataacagta gtagtaaggt tgacgtgaat 
tcgaaaccag aaactacggt ccataatagt 
aatgcgcggg gtgaattgat gccagggcaa 
gcaacgctaa gcaatacagg ggaagtcctt 

50 ctgaccagtg acaaagtgaa tgtctatacc 
gttcagagcc aggtaacggt tgcggttaag 
gtcgtggctt ctcctgacac catcaccgcc 
cgagtagaag atgattacgg attcccggtt 
accaaaggca gcccggtagt taatattcca 

55 acggcgacaa taaccagtac attggcagaa 
acagccaacc aatccgcaac cattacattg 
attttgaaat ccgatgttga cactctgaag 
ctaacattgc aagacaagta cggtaacccg 
cagtcaggcc ccttcgtgaa ctttctcaag 

60 tatggcgagt acaccgtgac tgtcactggc 
atgctgaacg gggttcatca ggcaaactta 
aaagaaatgt ccggtcatgt cactgcaaac 
agcgaaggct ttgcaggagc gtattacaca 
accgttgatg attatatgtt ttcaagttca 

65 aaagtttctt tcgcaaatat cggcgatcaa 
caaggaggta caacctacca gaccttaatt 
aatcatacca atatctggct agctgccaat 



gaagtggcgc tgtttgacaa gaatgatcgt 540 

ctcaactata cccccttccc gcttctgact 6O0 

ggtgaaaatg acacacgttt tgccgttgat 660 

aaacagctta atccggacga agtggccgga 720 

ctgattgatc gcaacaacaa catcgttctg 780 

agtctgctgg atccggtgaa agggaagtct 840 

cagaccaaat atgcccttaa aggctataac 90 0 

ggtaaagtca gcacgfcctgg aaaagatatc 9 SO 

aacaccccag aaaccgataa tacatggtcg 1020 

aacctgtcac ggcatgaaca aagcatggta 10 80 

gattctctgt tatccgtcaa tccgctaacc 1140 

ttgaccgtta ctgcgcacga ttccgacgga 12 00 

cgcagtgaag gcgttcagga tatcaccctg 12 60 

tacacacaga tactgacqgc cggaacgaca 13 2 0 

aacggtgaga gtgcggtaaa agaatccatc 13 80 

cgcgaccatt catcaataac aattgataac 1440 

gttagggtgg aactgaaaga cgatagcaat 15 O0 

gtaaaagccg ttactgtcga aaacagcaaa 15 60 

cagccggggg tttatgccgc gaattatccg 1620 

caacttagcc ttcacaactg gaatgctcca 168 0 

aaccagaata aggctcgcgt tgccacatta 1740 

aaaaagacat ttaataccct cacgatcaac 18 00 

aatcatcagg tcacctttaa gaatgaaaaa 18 60 

caaaatacgg atgcatatgg tgttgccaca 1920 

aatacgatta gcgccacgct gccaaatggt 19 80 

agcgattcga gtacgccaaa attcaaacaa 2040 

ggcaacagcc agggcagtac tctgaccgcc 2100 

aaagatatga aagtgaattt tgtggcacct 2160 

acaacagacc agtccggtat tgtgcgggtg 2220 

tccgtcgatg cctcgcttga ggtggataaa 22 80 

gtcccaaaca gggaacaatc ggtaatgacc 23 40 

aacaatacaa atatcgttac cctgactgcc 24 00 

ccggatgagg atgtgaaatt taccttgcca 2460 

agtgaaaccg cccgcaccga tgcaaacggt 2520 

gcgggtgagt ttacagttac ggcgacgctg 2580 

gtcactttta ttggggatac aaacagtgcg 2640 

tccattgttg cgggtaacag tacggggagt 27 0O 

caaaatccgc ttaaagacca gttggtcact 27 60 

acagaagtca ccaccaatac gctgggtcag 28 20 

ggacaacata acgtcgtggt gagccggaaa 28 80 

ttatcagtgc taccggatga aagttcggcg 294 O 

acgataacgg tgggcgaaaa catcacgcta 30 0O 

gtaatcgcgg gtcaacgcgt cagattaagt 30 60 

gatacggctt acaccgataa taacggttat 3120 

ggggtttatc aggtgacggc aacgctggac 3180 

gtggcaaatg gcaaactcga gttaacatca 3240 

gagggtatta cgctgaccgc aacggcgaga 33 00 

attatcacct ttagcgtaac gcctgaaggt 33 60 

actgaccagt caggtcaggc caaagtgacg 342 O 

gttacggcca taatgggcaa agatgttccc 34 80 

gcagatgcta aaacggcaca tgttgtgagc 3 540 

gacggcatcg atagcagcac catcacttca 3 6 0O 

gaaggtgtcg atattagtca tggcttagac 3 660 

actacgcgta ccgatcagtc cgggcaagtc 3 720 

accttaacag tcaatgtgca agttcctggc 3780 

gttgccggca cggccgatga aagtaagtca 3 840 

gctgactacc agcagagcgc aaaacttacg 39 00 

atagtgacgt ctgatcatct ggaatttgtc 3 960 

ttgagcgata ttgattacag ccaaagaaat 402 O 

ggaaaagagg gaacagcgac actcattccc 40 8 O 

agcatatcgc tgaatctcat ccaatcgata 414 O 

aaccatacct tctccacggc taaattcccg 42 0 0 

ctcaacaatg ataactttga agcgggtaaa 42 60 

cagggttggg tgtctgtcga tgcttcgggt 43 2 O 

acgtcagtca caataagcgc tgttccccga 43 8 O 

aagctgaaag gctggtgggt gaataatgga 444 O 

gcgctctgtc atgctaaaaa tgatggatat 450 O 
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aatcttcctg gcatcacaca tttgacgtct ggcgaaaaca aacgcacgca gggatcactg 4560 

tatggtgaat gggggaacgt tggagcgttt tccagtaatt cgcaatttac accgggagct 4620 

tactggacaa gtgaatctga tgattacagt cggcactact atgtgcagat gctaaccggfc 4680 

atgaccggaa gcgacgctga ttccagcccc caactgaccg cctgccgtaa atcactttaa 4740 

<212> Type : DNA 
<211> Length : 4740 

SequenceName : SEQ ID 393 

SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia 
<4QQ> PreSequenceString 

15 atgattactc atggttgtta tacccggacc 
attatgctta gtgctggttt aggattgttt 
ggtgaaaatt attttaaatt gggttcggat 
aatcgccttt tttatacgtt gaaaactggt 
gatattaatt tatcgacgat ttggtcgttg 

20 atgatgaagg ccgcgcctgg tcagcagatc 
tacagtgcac taccactttt aggttcggca 
cacacgaata aactgactaa aatgtccccg 
aaggcattaa attatgcggc acaacaggcg 
tctctgaacg gcgattacgc gaaagatacc 

25 tcacagttgc aggcctggtt acaacattat 
aataactttg acggtagttc actggacttc 
ctggcatttg gtcaggtcgg agcgcgttac 
gcgggtcagc gttttttcct tcctgcaaac 
gatttttctg gtgataatac ccgtttaggt 

30 aaaagtagcg ttaacggcta tttccgcatg 
gactatgatg agcgcccagc aaatggcttc 
tatccggcat taggcgccaa gctgatatat 
tttaattctg ataagctgca gtcgaatcct 
ccgattcctc tggtgacgat ggggatcgat 

35 ctcctttact caatgcagtt ccgttatcag 
ccacagtatg tfcaacgagtt aagaacatta 
aataacaata ttattctgga gtacaagaag 
gatattaatg gtactgaaca cagtacgcag 
ggtctggatc gtatcgtctg ggatgatagt 

40 catagcggaa gccaaagcgc acaagactac 
ggcagcaata tttataaagt gacggctcgc 
aatgtacagc ttactattac cgttctgtcg 
acggacttta cggcggataa gacttcggct 
accgcgacgg tgaaaaagaa tggggtagct 

45 gtttcaggaa ctgcaactct tggggcaaat 
accgtaacgt tgaagtcgag tacgccagga 
atgacttcag cacttaatgc cagtgcggtt 
actgagatta aggctgataa gacaactgca 
actgtaaaag ttatgaaaaa cggtcagcca 

50 aactttggga tgttcaacgg taagtctcaa 
gcgacgataa cactaacttc cagttccgcc 
gatggggctg aggttaaagc gactgaggtc 
aaggttgata ttattggtaa caatgtcaga 
ggtcagttta aactgaaagc aagcggtggt 

55 accagtatcg cgactgtcga tgcatcaggg 
gtaattaaag ccacatctgg tgataagcaa 
tatatgataa aagtggataa gcaagcctat 
ttattaccat ccacacagac ggtattgtca 
aaatatagcc attatagttc tatgaactca 

60 gagcagcgtt ctggagtatc aagcacttat 
gttaatgtta atactccaaa tgtctatgcg 
<212> Type : DNA 
<211> Length : 2805 

SequenceName : SEQ ID 394 

65 SequenceDescription : 



coli 0157 :H7 



cggcacaagc ataagctaaa aaaaacattg 60 

ttttatgtta atcagaattc atttgcaaat 12 0 

tcaaaactgt taactcatga tagctatcag 180 

gaaactgttg ccgatctttc taaatcgcaa 240 

aataagcatt tatacagttc tgaaagcgaa 3 00 

attttgccac tcaaaaaact tccctttgaa 360 

cctcttgttg ctgcaggtgg tgttgctggt 420 

gacgtgacca aaagcaacat gaccgatgac 480 

gcgagtctcg gtagccagct tcagtcgcga 540 

gctcttggta tcgctggtaa ccaggcttcg 600 

ggaacggcag aggttaatct gcagagtggt 660 

ttattaccgt tctatgattc cgaaaaaatg 720 

attgactccc gctttacggc aaatttaggt 780 

atgttgggct ataacgtctt cattgatcag 840 

attggtggcg aatactggcg agactatttc 900 

agcggctggc atgagtcata caataagaaa 960 

gatatccgtt ttaatggcta tctaccgtca 1020 

gagcagtatt atggtgataa tgttgctttg 1080 

ggtgcggcga ccgttggtgt aaactatact 1140 

taccgtcatg gtacgggtaa tgaaaatgat 1200 

tttgataaat cgtggtctca gcaaattgaa 1260 

tcaggcagcc gttacgatct ggttcagcgt 13 20 

caggatattc tttctctgaa tattccgcat 13 8 0 

aagattcagt tgatcgttaa gagcaaatac 1440 

gcattacgca gtcagggcgg tcagattcag 1500 

caggctattt tgcctgctta tgtgcaaggt 1560 

gcctatgacc gtaatggcaa tagctctaac 162 0 

aatggtcaag ttgtcgacca ggttggggta 1680 

aaagcggata acgccgatac cattacttat 1740 

caggctaatg tccctgtttc atttaatatt 18 0 0 

agtgccaaaa cggatgctaa cggtaaggca 18 60 

caggtcgtcg tgtctgctaa aaccgcggag 1920 

atattttttg atcaaaccaa ggccagcatt 19 80 

gtagcaaatg gtaaggatgc tattaaatat 2040 

gttaataatc aatccgttac attctcaaca 2100 

acgcaagcaa ccacgggaaa tgatggtcgt 2160 

ggtaaagcga ctgttagtgc gacagtcagt 22 20 

actttttttg atgaactgaa aattgacaac 22 80 

ggcgagttgc ctaatatttg gctgcaatat 23 40 

gatggtacat attcatggta ttcagaaaat 24 00 

aaagtcactt tgaatggtaa aggcagtgtc 24 60 

acagtaagtt acactataaa agcaccgtcg 2520 

tatgctgatg ctatgtccat ttgcaaaaat 2580 

gatatttatg actcatgggg ggctgcaaat 2 640 

ataactgctt ggattaaaca gacatctagt 2700 

aacctaataa cacaaaaccc tcttcctggg 2760 

gtttgtgtag aataa 2 805 



Sequence 



WO 2005/076010 



123/341 



PCT/IN2005/000037 



<213> OrganismName : Escherichia 
<40 0> PreSequenceString : 
atgttagtac ttagcgaaag cttcaagaat 
5 ggcggcagcg actccggatc taaagcccag 
cagcgtgaaa tgtggcagac gaacatgcaa 
cagtacgtat cacagttgca gaatctttcc 
cagtattaca actctcagca gtataaagac 
gcagcagcag aggcaacggg tggattaggc 

10 atcgcaccta cactcggtca aaactggctg 
gcaaatatcg gccttggtgc tcttacaggt 
aatgtcagcc aattgtatca acagcaggcg 
tcaggcctac agagttttgc tacaggtgcc 
ggtagtgcag ttcctgttat tgggactggt 

15 ggtcttggat cattgtttta a 
<212> Type : DNA 
<211> Length : 681 

SequenceName : SEQ ID 3 95 
SequenceDescription : 

20 

Sequence 



coli 0157:H7 

aaattgcttc ccatgaatgg gtatatgaaa 60 

gcacgcgcaa ctgaaaaggg catcgaactg 12 0 

aaccttgcac cgttcacgcc actcgctcag 18 0 

tctcttcagg ggcaaggtca ggcgcttaac 240 

cttgcagggc aggcgcgcta tcagagtctg 30 0 

tctacagcaa caggaaacca gttagcagca 3 60 

tcaggtcaga tgaacaacta caacaatctg 42 0 

caggcaaacg ccggacagaa ctacgctaac 48 0 

gcagcatcgg cagcaaatgc gaataagcct 54 0 

attggtgggg ccgcatcagg tgcaatgatt 60 0 

attggtgctc ttgctggcgg tgttatcggt 660. 

681 



<213> OrganisniNarae : Escherichia 
<40 0> PreSequenceString : 

25 atgaaaaaaa tattatcagg gttgattctg 
aatggtgatg gcgcaacgca catgtcaaat 
gcgaataacc actccggata caatattttc 
ccggtgcgct gtcactgtga tgacacgcat 
cctatcttct acacgggaga tgccgcaccg 

30 ttaaattact atgctctgaa tgattattta 
aaccaatatg cggccattcc ttttgaacac 
acctgtggag caggtaataa tgggagcact 
ttatctttct atgttcggca ttctattact 
gcctggttgt acgcgggcat gtccgatcat 

35 acaattcgcg gacaactaac ggccccgcag 
gatgtcgatt ttcaaaaaat taatagcgct 
gcagaaagaa agattaaaac cgaagtcaca 
tccacggagg tggtgagtgc gtcgatgatt 
atcgtgacga gtaatccgga tgtgggaatt 

40 aatgtggatg ggggcaactt acccgctgat 
gatggtagcg taacgtttta ttcagcgccc 
gataatggat ttaccgctac agccacgctg 

<212> Type : DNA 
45 <211> Length : 1071 

SequenceName : SEQ ID 396 
SequenceDescription : 



coli 0157 :H7 



ctgctttgct gtccttatgg tttcgccgct 60 

ttatcatttg gtccgctgac ggtggcagcg 12 0 

gaggcactga gcaacacgac tggaacatac 180 

ggcggaccgg gccaacaaac agcatttttt 240 

gggcttgtgc ttgagcgcac tcttaatggg 3 00 

tcggtcggcg tgacgatttt tattattaat 3 60 

ttatccaacc aatccacctc accgcaacat 420 

gtaaatctgg attcagggcg ctcggcaaaa 48 0 

ggcacggtga caatacccac aacggaagtc 540 

tttcccaaaa cgacccccgt ttctaaagtg 6 00 

aactgtgagt taacgccaaa tcagagcatc 660 

gagttctcct caacggcggg ttcaattatt 72 0 

gtatcctgta ccgggatgga agacgtaagg 780 

gcggcaaaca gaagtgccga tgccaccatg 840 

aagatttttg ataagaacga ccgtccagtg 900 

atgggtgcta ttagtcgatt aggaaaaacc 960 

gccagtctga cgggcgcaaa accagcgcct 1020 

gttattgaat ttactaacta a 1071 



Sequence 
50 

<213> OrganismName : Escherichia coli 0157:H7 

<400> PreSequenceString : 

atgaataaaa tatatcggct aaagtggaac aggtcccgta actgttggag cgtctgctcg 60 

gagctgggga gcagagtaaa aggaaaaaag tcccgggctg ttttaattag cgcgataagt 120 

55 ttatattcat ctctggtatt cgccgatgat gtcatcgtaa accaggataa aactattgat 180 

tttggcaaag agaaccagag catcgattac cgtattacgg tgacagacaa tgccaatctg 24 0 

gtaatcaatg cgacagatac ttcccgtccg cgtctgactc tcgcttctgg tggtgggttg 3 00 

gatattaccg gaggaaaggt aacaatcaat ggcccactta actttttgct gaaaggtacg 3 60 

gggttcctga atgtctccaa tgctggcagc gagttatatg ctgatgattt gtatgaatca 420 

60 aactcaggca tgagacacga tcgcggctat tttaatgtct ccaacggcgg caaaatccat 480 

gttaagggca ccagccgtct gacctatttg cagggaaatg tcagtggtga aggtagccag 540 

gtaaattccg aaaccttctt tatgggcgtt tacggcagtt acggtggtaa tcagtacctg 60 0 

tcagttaata acggcggtga agttaatgcc aggaagcaaa ttagcctggg ctattatgat 660 

caagtctccg atacaacact tgctgtttcg gaaggtggta aaatttctgc gcctactatt 72 0 

65 agtttaagca ccaactctga gttagcgtta ggggcacagg aaggaagcgc agcgaaggca 7 80 

gcagggatta ttgatgccga aaaaattgag tttgtgtggg caaagacatc cgagaagaaa 840 

atcaccttaa accacacgga taaagacgcg actatttccg cggatattgt cagtggcagc 90 0 
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gagggcctgg gctatatcaa tgcgctcaat 
gcctttagtg gtaaagtcaa aattgagcaa 
ggtacagcag agatcaacaa ccgcgggaaa 
tttgccaata agatctctgg caacggtaca 
5 accggcaata actatgcatt cagcggatat 
atttctgaag acaagaatat cggtcgtgca 
aatgccaaca aagattgggt atttgataac 
aacatgggga atcacgaatt ctccttcgat 
tcactggcgt tccagaacac gacatttaat 

10 ggcgggatca ctgcgggtca gggaagcctg 
agcactttgg gattctccgg cggaaccgtt 
atgacagaag ggacggtcaa cgttagtaaa 
caggtttctg acagtgacgt tgtccgctca 
ctcactgaag tcgatgatgg taacagcacc 

15 gttctgggcg atgcgggcaa tctgcaattg 
agcgcccaac gtgatattca gcagaatggg 
cgtctgacga gtggggtaaa caatgacggt 
gatttacacg ctaccgacag cgatgctctg 
gccgccgatc tcagcgcaaa gattaccggc 

20 ggtcagaccg tatcgctttc taacaaagac 
agtgggacgc ttttgttgaa taacgataac 
gcggcagaga ctgaactgga catgaatggt 
agcgccgatt cactgctgag cttaaatggc 
tcaaccggtt cgttaacggg gagcggagag 

25 gcgggcgata acagcaacct gacggcgaat 
gtaagtcatg cgcagggatt gggtagcgca 
aataatagcg ctgaaaaaag agcggctgcg 
accaacaacg gtacgctgat gaccggaatg 
gtgaagggga actaccacgg taataacggt 

30 gatgactcag taaccgataa attggttgtc 
acggtgaata acgctggcgg tacaggtgcg 
gtagacggta agtctgaggg cgaatttgtt 
gactacactc tcgcgcgtgg acaaggggca 
agtgatfcctc ctgaactgca gccggagcca 

35 aatccagagc cgaaccctaa cccgacacct 
gacctgcgac cggaggcggg tagctacatt 
accacgcgtc tgcatgagcg tctgggtaat 
cagaaacaaa ccactatgtg gatgcgccat 
agcggccagc tgaaaaccca aagcaatcgc 

40 cagtggagcc aaaacggcag cgaccgctgg 
agcgacagca aaaccatttc ctcgcgaacc 
tatagcacag gcctctatgc cacctggtat 
ctcgacagtt gggcgcagta cagctggttt 
agtgaatcct ataaatcaaa aggatttacc 

45 ttagctgaat ttaatggcag ccagggaacg 
caggttacct ggatgggagt caaagccgat 
catagcaacg gtgatggcaa tgttcaaacc 
caccataaaa tggatgacgg taaatcccgc 
ctacataaca gtaaggattt cagcaccagt 

50 gcccgaaata^ ttgctgagat aaaaaccggg 
gtctggggga atgtgggcgt tcaggttgcc 
gttggcatta agtggcaatt ctga 
<212> Type : DNA 
<211> Length : 3984 

55 SequenceName : SEQ ID 3 97 

* SequenceDescription : 



ggcacgactt acttaaccgg tgataactct 960 

aatggcgctt tagggatcac ccaaaatata 1020 

ttacaccfcga aggctgacga tagcatgacc 1080 

ataagtatcg acagtgggac ggtggagttg 1140 

attgatgttg cttctggtgc tgtcgctgtt 12 0 0 

gagctggatg tcgatggcaa attgcaaatt 1260 

gatcttgaag gtagaggcat tgttgaaata 1320 

gagtttgctt atacagactg gttccagggt 13 80 

ctggaaaaga atgctgagtt tctgcagaaa 1440 

gtaacagtgg gtaagggcgc tcactccatt 1500 

gattttggtg ccctgacagc aggtgcacag 1560 

acgctggatt tgcgcggcga gggtgtgatt 1620 

gtatctcgtg atattgactc tgcgttatcg 168 0 

attaagttgg ttgatgcgca aggtgcggaa 1740 

caggataaaa atgggcaaat cctctccagc 18 0 0 

caaaaagcgg ccgtcggcac ttacgactat 1860 

ctgtatattg gttacggcct gacccagctt 192 0 

gtgctgagct ctaacggtaa aagcgagaat 1980 

agtggtgacc tggcattcag cagccagaag 2040 

aacgactata ccggtgttac cgatctgcgc 2100 

gtgttgggta atacccatga actgcgtctg 2160 

cacagccaga ccgtgggcac gctcaatggc 2220 

ggcagtctga cggfctaccaa cgggggcact 228 0 

ctgaatattc agggcggcac gctggacatc 2340 

gtgaacattg ctaattcggc taatgtcctg 2400 

aacgttgaga acaacggtac cctggcgttg 2460 

tctgtgaatt acgccctggg cggcaatctg 252 0 

tcaggacagc aagctggcaa tgtgttagtg 2580 

caactagtaa tgaatacggt actgaatggc 2640 

gagggcgata ctagcggcac gactgccgtt 2700 

aaaaccctta acggtatcga acttatccat 2760 

caggctgggc gtatcgttgc gggggcgtat 2820 

aatagtggta actggtatct gaccagcggc 2880 

gacccgatgc cgaatccaga gccaaacccg 2940 

acgccgggtc cggatctgaa tgtggataat 30 00 

gcgaaccttg cagcagcgaa taccafcgttc 3060 

acgtactata ccgacatggt gacgggtgag 3120 

gaaggtggtc ataataaatg gcgtgatggc 3180 

tatgttctgc aactgggagg cgatgtcgcg 3240 

catgttgggg tcatggcggg atatggcaac 33 00 

ggttatcgtg caaaagcgag tgtgaacgga 33 60 

gccgatgacg agtcgcgtaa tggcgcgtat 3420 

gataacacag tgaaagggga tgacttacaa 3480 

gcttcactgg aagctggata caaacacaaa 3540 

cgtaatgaat ggtatgttca gccgcaagca 3600 

aagcaccgcg aaagcaacgg aaccctcgtt 3660 

cgacttggcg taaaaacctg gctgaagagc 3720 

gagttccagc cgtttgtaga agtgaactgg 3780 

atggatggcg tgtctgtcac tcaggatgga 3 840 

gtggaaggac agctaaatgc caacctgaat 3900 

gataggggat ataatgacac ctctgcaatg 3 960 

3984 



Sequence 



60 <213> OrganismName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

atgataacga tgaaaaaaag tgtattgacg gcgtttataa ctgtggtatg tgcaacgtcc 60 

agcgttatgg ctgctgatga taatgctatc acggatggct cagtaacatt taatggtaaa 120 

gttattgctc cagcttgtac cctggtagct gcgacgaaag attccgtggt gactttgcca 180 

65 gatgttagtg ccacgaagtt gcaaaccaat ggtcaggttt ctggcgtgca aactgatgtg 240 

ccaattgaat taaaagattg tgatactacc gtaacaaaaa atgcaacgtt cacctttaat 3 00 

ggcactgcgg atactactca gattacagcg tttgctaacc aggcctcatc tgatgctgct 360 
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acaaacgtgg ccctgcaaat gtatatgaat 
gaaaccggga acattttgtt gcaagatgga 
atcgctacgg ggaaagcgac ttctggtaat 
tattattaa 
5 <212> Type : DNA 

<211> Length : 549 

SequenceName : SEQ ID 398 
SequenceDescription : 

10 Sequence 



<213> OrganismName : Escherichia 
<40 0> PreSequenceString : 
atgagtaagt ttgtaaaaac, agctattgct 

15 acttcaacaa tcgccgctgg caacaatggt 
tccccgtgct ctatcgttcc tgatgatcac 
tcagggatcc tgaaaaataa cgggacttct 
gactgtgtgt ttgacaccca gacaacgatg 
accaacagcg gcaattacta caccatttac 

20 gtcagcctgg ccattggtga cgctcaggga 
cagaaaatcg taaacgatac ggcgaccaac 
aaagcctggc tggtgggcgc tgctgatgcg 
accttccaga ttacttatct ctaa 
<212> Type : DNA 

25 <211> Length : 564 

SequenceName : SEQ ID 3 99 
SequenceDescription : 

Sequence 
30 

<213> OrganismName : Escherichia 
<40 0> PreSequenceString : 
atgcgggtta tctttctacg caaggagtat 
cttttctctg ctaacggtgt cgcggcggcc 

35 gcgagttgtc acgccagcag gcaaagcctt 
gatgggcaat ggctggtttt ttcggatatg 
ttgcaacaag gagcggaatt tacattatca 
gccaataaca ccgtttcagg agaatataat 
tcaacgctga atcttacgga tgttattttt 

40 gcaatctatt cttctggtac taacgatacc 
gtgtttcgca ataacatcgc taatgacggc 
gatatctatt taagtgatga tgtttttaac 
agtgatggcg atggcggcgc aatcgatgtc 
tcaggttata cgataataaa taacactgcc 

45 ggggcgatat ataccaatag cgcgacggct 
agctacagcc agaacggagg cgtgttagtc 
gatggtcctt cctctgcggc gggtggcttt 
gatattgccg acggaaaaac gctggttatt 
tctattgctg gtaccgggtt aatcaccaaa 

50 gataacaatg actttactgg tgagatgcag 
agcaactccc tgatgaatgt cggcgatacg 
ggtctgacga tagggagtat tgataagtac 
acccaacaaa cctttgcgca ctcattgacg 
gctggtggca atgttactgt taatcaaggc 

55 cagctcacca ttgcgcaaaa cggcagctat 
accggcgata tagtggtgga tgctggtgcg 
cttgccgctc tccaggacga tccgcagtcg 
tctgatttct ccacctggca gagcggtaca 
agcagcggaa cggttatcgg cagtcaggat 

60 catatcggcg gcgacgggaa agatggcgtc 
gtcagcctgg caaatgacaa tcaatacctc 
atggtgagcg acaactcgca gcttggatat 
gataagccac aagaaagcgt gatggagatt 
actgagcatg ggcgtgatat tgaaatgcgc 

65 gtagacacgc agtggggcgc actgatggct 
agcacattga ctaaaacggg ggcgggtaca 
tcagcggtga gagtagaaga gggcacgctg 



gatggtacaa cggccatcaa gccagacaca 420 
gatcagacgt tgacttttaa agttgattat 48 0 

gtgaatgcgg taacaaattt ccatattaac 540 

549 



COli 0157:H7 



gcaacaatgg taatgggtgc gtfccgcttct 60 

acagcacgtt tctacggcac cattgaagat 120 

aaactggaag ttgatatggg tgacattggt 18 0 

acaccgaaag ctttccagat ccatctgcaa 240 

accactacct tcaccggtaa cgcgtcttct 300 

aataccgata ctggtgcggc atttaacaat 3 60 

acctcttata aaagcggcgc gggtatcgaa 420 

aaaggcaaag cgaagcagac gctggacttt 48 0 

ccagatttag gtaattttga agccaacacg 540 

564 



coli 0157:H7 



ttatctctac tcccgtcaat gattgcatct 60 

attgatttat gccagggata tgatatcaaa 120 

tcaggcatta cgcaggtctg gagtattgcc 180 

accaataatg ccagcggtgg ggccgtattt 240 

ccagaaaatg aaactggaat gactctgttt 3 00 

aacggcgggg caatatttgc taaagaaaac 3 60 

tctggtaacg tcgcaggcgg ctatggtggc 420 

ggtgccatcg atttacgtgt cactaacgcc 480 

aaaggtggtg caatttatac catcaataat 540 

aataaccagg catatacatc aacaagttac 600 

acagataata atagcgacag caagcatcct 660 

tttacaaata acactgccga aggttatggc 720 

ccctatctta ttgatatttc tgttgatgac 780 

gatgagaaca atagcgcagc aggctatgga 840 

atgtatctcg gcttaagtga agttaccttt 900 

ggcaatacag agaatgacgg agctgttgac 960 

acaggttccg gcgatctggt acttaatgca 1020 

attgaaaacg gtgaagttac cctgggccgc 1080 

cattgccagg acgatccgca agactgctac 1140 

cagaatcagg cagagctaaa tgttggctcc 1200 

ggctttcaga atggcacttt aaatatcgat 1260 

agttttgctg gcaccatcga aggtgctggt 132 0 

gtgctggcgg gggcgcagtc gatggcgcta 13 8 0 

gtgctttcgc tggaaggcga cgcggcagat 144 0 

atcgtgttaa acggcggtat gctcgatctc 1500 

tcatacaaag atggccttga agtcagtggc 1560 

gtggtagatc ttgcaggcgg aaacgatatg 1620 

tacgtggtga tcgatgcggg tgacgggcag 168 0 

ggcacaacgc aaatcgcttc cggtacgctg 1740 

acccattata accgccaggt tatctttacc 1800 

actgccaatg tcgatactcg ctctacaacg 1860 

gccgacggtg aagtggcagt tgatgcgggg 192 0 

gacagcagcg ggcagcatca ggatgagggt 198 0 

ctggagctga ccgccagcgg tacaacgcag 2040 

caaggtgatg ttgcggatat cttcccttat 210 0 
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gcttcgtcgc tatgggtcgg tgacggggca 
cagtcaattg atgctacttc cagcggcact 
ctgaccgggc aggatacttc cgtcgccctt 
ctggtgaatg ccaccgatgg tgtgacgttg 
5 gacagcctga cttatctttc caacgtgacg 
gcggttagcc tgcaaaatgg cgtcgctggc 
ggcggcggta cgctactgct cgatagcgaa 
ttggtgatga acggtaatac tgctggcaac 
attggtgagc cgacatcgac aggcattaaa 

10 tttcaaaaca atgcgcagtt cagtctggca 
gactacacgc tggtggaaga taacaacgac 
ccatcgccac ctgatccaga cccgactccc 
acacccgacc cggaacctac gcctgcttac 
tatctcaata acctgcgggc ggcaaatcag 

15 ggtggcgatg gtcagacgct gaatttacgt 
gcggggcaac tggctcaaca tgaagacact 
agcgggcgct ggggcacgga tggcgagtgg 
aaccagggcg acagccgctc gagtatgacc 
tatgcggttg ggctgacctc aagctggttt 

20 ctggataact ggttgcagta cgcgtggttt 
gtggatcatt accattcgtc ggggattatc 
ccggggcgtg gtgtggtgat tgaaccgcag 
gatgatttta ccgccgctaa ccgtgcgcgc 
acgcggctgg gtttacacag cgaatggcgt 

25 ctgaattatt atcacgatcc ccattcgacg 
gacgatgcgg tgaagcaacg gggtgaaata 
cgagtttcgc tgcgtggtag cgtggcgtgg 
gcagggtttt tgtcgatgac ggtgaaatgg 
<212> Type : DMA 

30 <211> Length : 3753 

SequenceName : SEQ ID 40 0 
SequenceDescription : 



acgttcgtta ctggcgcgga tcaggatatt 2160 

atcgacatca gcgatggtac ggttttgcgc 222 0 

aatgcctcac tgtttaactg cgatgggacg 228 0 

acaggtgagc ttaataccaa ccttgaaact 2340 

gttaatggca atctgaccaa tacgtccggt 2400 

gatacgctga cggtaaacgg tgattatacc 2460 

ttaaacggcg atgactcggt aagcgatcaa 2520 

acaactgtgg tggttaactc cattacaggg 2580 

gtggttgatt tcgcagctga tcccacgcag 2640 

ggcagcggct acgtcaatat gggagcgtat 27 00 

tggtatctgc gatcgcaaga agtaacgccg 2760 

gatcctgatc ccacgcagga tcctgatcca 282 0 

cagccggtgt tgaatgccaa agttggcggt 28 80 

gcgtttatga tggagcgacg cgatcacgca 2940 

gttatcggcg gagattatca ttacacagca 30 00 

tctacggtgc aacttagcgg cgatctgttt 3060 

atgcttggga ttgttggtgg ctacagcgat 3120 

ggaactcgcg ccgataacca gaaccacggt 3180 

cagcacggta agcagaagca gggggcctgg 3240 

agcaatgatg tttctgaaca tgaagatggc 33 00 

gcctcgctgg aagcggggta tcagtggtta 33 60 

gcgcaggtga tttatcaggg cgtgcagcag 3420 

gtgtcacaat cgcagggtga tgatattcag 3480 

accgctgttc atgtcatacc aacattagat 3 540 

gaaattgaag aagatgccag cactatcagt 3 600 

aaagtgggag tcacgggcaa tatcagtcag 3 660 

cagaaaggga gtgatgattt tgcccagacg 3720 

taa 3753 



35 



Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<40Q> PreSequenceString 

atgcactcct ggaaaaagaa acttatagta tcacaattag cattggcttg cactctggct 60 

atcacctctc aggctaatgc agcgaccaac gatatttctg gtcaaactta caatactttc 120 

40 catcactaca acgacgccac ctatgctgat gacgtttact atgatggtta tgtaggctgg 180 

aacaactatg ccgctgatag ctattacaac ggcgatatct acccggtcat taataacgct 240 

accgttaacg gcgtgatttc tacctactat ctggacgacg gtatttctac caataccaac 3 00 

gccaatagtc tgacaatcaa aaacagcact attcacggta tgattacctc tgagtgcatg 3 60 

actactgatt gtgctgatga ccgtgctact ggttatgttt atgatcgtct gacactgagc 420 

45 gttgataatt caacgatcga tgacaactac gagcattata cttacaacgg tacctataat 480 

aatgccgctg acactcatgt tgtagatgtt tacgatatgg gtactgctat tacactggat 540 

caggaagttg atctgtccat cactaataac tctcatgtag caggtattac gctgactcag 600 

ggttatgagt gggaagatat tgacgacaac acagtcagca ctggcgtaaa cagcagcgaa 660 

gtgtttaata acactattac tgttaaagat tctactgtga cctctggttc atggactgat 720 

50 gaaggtacta ctggttggtt tggccatact ggtaatgcca gcaactatag caacacgctg 780 

actgcagacg atgttgcaat tgccgcaatc gcaaatccgt atgctgataa tgcgatgcag 840 

actacagtaa ctttagacaa ctcaacactg atgggtgatg ttgttttctc cagtaatttc 90 0 

gatgaaaact tcttcccgca aggtgctaac agctatcgcg atgctgatgg tgatgtagat 960 

accaacggtt gggatggcac agaccgtatg gatgtgactc tgaacaacgg cagcaagtgg 102 0 

55 gttggcgctg caatgtctgt tcatatggtt gatgaagatg gtgatggttc ttacgacgga 108 0 

tatgctgttg gtactgaagc aactgcaact ctgctcgata ttgcagctaa cagcctgtgg 1140 

ccttcatcaa ctgtcggtgt tgataacatc aatactcaat atgacgaaaa tggccatatc 120 0 

gtaggaaacg aagtttacca gagcggtttg tttaatgtga ctttgaacgg tggttcagag 1260 

tgggatacaa caaaatcttc tctgattgat actttaagta ttaacagcgg ttcccaagtt 132 0 

60 aatgttgcag actctcgtct gatctctgac actgtctctc tgactggcgg ttctaacctg 13 80 

aacatcggtg aagacggtca tgtagcgact aataccctga ccatcgacaa tagtaccgtt 1440 

aaaatgtctg atgatgtttc tgcgggctgg ggtttagaag atgctgcact gtacgcaaat 1500 

accatcaccg taactaacga cggtctgttg gatattaacg ttgatcagtt cgatgctaac 1560 

ccgttccagg ccgataccct gaatctgacc agtaccactg atactaacgg caacattcac 162 0 

65 gctggtgtat tcgatatcca tagcagtgat tacgtaatgg ataccgatct ggtcaacgat 168 0 

cgtaccaacg atactaccaa gtcaaactac ggttatggct taatcgcaat gaactctgat 1740 

ggtcacctga ctattaacgg taacggcgat aacgacaaca ctgcttctat cgaagctggt 1800 
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cagaacgaag ttgataacaa cggtgaccat gttgcagccg cgaccggtaa ctacaaagtt 1860 

cgtatcgaca acgctactgg tgctggttct atcgctgact acaacggcaa cgagctgatc 1920 

tacgtcaacg acaaaaacag caacgcgacc ttctctgctg ctaacaaagc tgacctgggt 19 8 0 

gcatacacct atcaggctga acagcgcggt aacaccgttg ttctgcaaca gatggagttg 2 040 

5 accgactacg ctaacatggc gctgagcatc ccatctgcga acaccaatat ctggaacctg 2100 

gaacaagaca ccgttggtac tcgtttgacc aactctcgtc atggcctggc tgataacggc 2160 

ggcgcatggg taagctactt cggtggtaac ttcaacggcg acaacggcac catcaactat 2220 

gatcaggatg ttaacggcat catggtcggt gttgatacca aaattgacgg taacaacgct 2280 

aagtggatcg tcggtgcggc tgcaggcttc gctaaaggtg acatgaatga ccgttctggt 2340 

10 caggtggatc aagacagcca gactgcctac atctactctt ctgctcactt cgcgaacaac 240 0 

gtctttgttg atggtagctt gagctactct cacttcaaca acgacctgtc tgcaaccatg 2460 

agcaacggta cttacgttga cggtagcacc aactccgacg cttggggctt cggtttgaaa 2520 

gccggttacg acttcaaact gggtgatgct ggttacgtga ctccttacgg cagcatttct 2580 

ggtctgttcc agtctggtga tgactaccag ctgagcaacg acatgaaagt tgacggtcag 2640 

15 tcttacgaca gcatgcgtta tgaactgggt gtagatgcag gttatacctt cacctacagc 2700 

gaagatcagg ctctgactcc gtacttcaaa ctggcttacg tctacgacga ctctaacaac 2760 

gataacgatg tgaacggtga ttccatcgat aacggtactg aagggtctgc ggtacgtgtt 282 0 

ggtctgggta ctcagttcag cttcaccaag aacttcagcg cctataccga tgctaactac 2880 

ctcggtggtg gtgacgtaga tcaagactgg tccgcgaacg tgggtgttaa atatacctgg 2940 

20 taa " 2943 

<212> Type : DNA 
<211> Length : 2943 

SequenceName : SEQ ID 401 
SequenceDe script ion : 

25 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
<40 0> PreSequenceString : 

30 atgaaactca aacatgttgg tatgattgtc gtttctgtgt tggcgatgtc gtctgctgcg 60 
gtaagcgcag ccgagggtga tgaatcagta acgaccactg ttaatggcgg tgttattcat 120 
tttaaaggtg aagtggtaaa tgccgcttgt gcgattgatt ccgaatcaat gaaccaaacg 18 0 

gttgagctgg gtcaggttcg ttcttctcgc ctggctaaag cgggtgacct cagctccgcc 240 
gttggcttca atatcaagct gaatgattgt gataccaatg tttccagtaa tgcagctgtt 3 00 

3S gcattcctgg gtactactgt caccagtaat gacgatacgt tagcgctgca gagttcagcg 360 
gcaggctctg cccaaaatgt cggtattcaa attttggacc gtacgggtga ggtattaata 420 
cttgatgggg ccacttttag tgctaaaacc gacttgattg atggcacgaa tatactacca 480 
ttccaggctc gttatattgc tctcgggcag tccgtagctg gtactgcaaa cgcagatgcg 540 
accttcaaag ttcaatatct ataa 564 

40 <212> Type : DMA 

<211> Length : 564 

SequenceName : SEQ ID 402 
SequenceDescription : 

45 Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<40 0> PreSequenceString : 

atgaaacttt taaaagtagc agcaattgca gcaatcgtat tctccggtag cgctctggca 60 
50 ggtgttgttc ctcagtacgg cggcggtggc ggtaaccacg gtggtggcgg taataacagc 120 
ggcccgaatt cagagctgaa tatttatcag tacggtggtg gtaactctgc acttgctctg 18 0 

caagctgatg ctcgtaactc tgatcttact attacccagc atggtggtgg taacggtgca 240 
gatgttggtc agggctcaga tgacagctca atcgatctga cccaacgtgg ctttggtaac 3 00 

agcgccactc ttgatcagtg gaacggtaaa gactctcata tgacagttaa acaattcggt 360 
55 ggcggcaacg gtgcagcggt tgaccagact gcatctaatt ccaccgtcaa cgtaactcag 420 
gttggctttg gtaacaacgc gaccgctcat cagtactaa 459 
<212> Type : DNA 
<211> Length : 459 

SequenceName : SEQ ID 403 
60 SequenceDescription : 



Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
65 <400> PreSequenceString : 

atgcctattg gtaatcttgg tcataatccc aatgtgaata attcaattcc tcctgcacct 60 
ccattacctt cacaaaccga cggtgcaggg gggcgtggtc agctcattaa ctctacgggg 120 
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ccgttgggat ctcgtgcgct atttacgcct gtaaggaatfc ctatggctga ttctggcgac 18 0 

aatcgtgcca gtgatgttcc tggacttcct gtaaatccga tgcgcctggc ggcgtctgag 240 

ataacactga atgatggatt tgaagttctt catgatcatg gtccgctcga tactcttaac 300 

aggcagattg gctcttcggt atttcgagtt gaaactcagg aagatggtaa acatattgct 360 

5 gtcggtcaga ggaatggtgt tgagacctct gttgttttaa gtgatcaaga gtacgctcgc 42 0 

ttgcagtcca ttgatcctga aggtaaagac aaatttgtat ttactggagg ccgtggtggt 480 

gctgggcatg ctatggtcac cgttgcttca gatatcacgg aagcccgcca aaggatactg 540 

gagctgttag agcccaaagg gaccggggag tccaaaggtg ctggggagtc aaaaggcgtt 60 0 

ggggagttga gggagtcaaa tagcggtgcg gaaaacacca cagaaactca gacctcaacc 660 

10 tcaacttcca gccttcgttc agatcctaaa ctttggttgg cgttggggac tgttgctaca 72 0 

ggtctgatag ggttggcggc gacgggtatt gtacaggcgc ttgcattgac gccggagccg 780 

gatagcccaa ccacgaccga ccctgatgca gctgcaagtg caactgaaac tgcgacaaga 840 

gatcagttaa cgaaagaagc gttccagaac ccagataatc aaaaagttaa tatcgatgag 900 

ctcggaaatg cgattccgtc aggggtattg .aaagatgatg ttgttgcgaa tatagaagag 960< 

15 caggctaaag cagcaggcga agaggccaaa cagcaagcca ttgaaaataa tgctcaggcg 1020 

caaaaaaaat atgatgaaca acaagctaaa cgccaggagg agctgaaagt ttcatcgggg 1080 

gctggctacg gtcttagtgg cgcattgatt cttggtgggg gaattggtgt tgccgtcacc 1140 

gctgcgcttc atcgaaaaaa tcagccggta gaacaaacaa caacaacaac tactacaact 12 00 

acaactacaa gcgcacgtac ggtagagaat aagcctgcaa ataatacacc tgcacagggc 1260 

20 aatgtagata cccctgggtc agaagatacc atggagagca gacgtagctc gatggctagc 1320 

acctcgtcga ctttctttga cacttccagc atagggaccg tgcagaatcc gtatgctgat 13 80 

gttaaaacat cgctgcatga ttcgcaggtg ccgacttcta attctaatac gtctgttcag 144 0 

aatatgggga atacagattc tgttgtatat agcaccattc aacatcctcc ccgggatact 150 0 

actgataacg gcgcacggtt attaggaaat ccaagtgcgg ggattcaaag cacttatgcg 1560 

25 cgtctggcgc taagtggtgg attacgccat gacatgggag gattaacggg ggggagtaat 162 0 

agcgctgtga atacttcgaa taacccacca gcgccgggat cccatcgttt cgtctaa 1677 

<212> Type : DNA 
<211> Length. : 1677 
30 SequenceName : SEQ ID 404 

SequenceDescription : 

Sequence 

35 <213> Organ! smName r Escherichia coli 0157 :H7 
<400> PreSequenceString : 

atgttttcta ctttcaaaaa agcagctctg ctggcagcta ttgcattacc tttttcaact 60 

atggctgcgc ctacagtcac ttttcagggt gaagtaaccg atcagacctg ttccgtaaat 120 

atcaacggtc aaaccaattc agtagtattg atgccgaccg tagccatggc tgacttcggt 180 

40 gcaactttag ctgatggtca gagcgcaggc cagacgccgt ttacggtttc tgtgtctaac 240 

tgccaggctc caactggtgc agatcaggca atcaacacca ccttcctggg ctacgacgtt 30 0 

gacgctagca cgggtgttat gggaaaccgt gataccagca gcgatgcggc gaaaggcttt 360 

ggcattcagt taatggattc cagcacttct ggtaacccag taactctggc tggcgcgact 42 0 

aacgtaccgg gtctgaccct gaaagttggc gataccgaag ccagctacga cttcggtgcg 480 

45 cgttacttcg ttatcgatag cgctgctgcc actgccggta aaattaccgc tgtcgcagaa 540 

tacaccctga gctacctcta a 561 
<212> Type : DMA 
<211> Length : 561 

SequenceName : SEQ ID 405 

50 SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 

55 <400> PreSequenceString : 

atgaacagtg aaggaggaaa accggggaat gtactgaccg ttaacggcaa ctataccgga 60 

aacaatggcc tgatgacgtt caacgcgacg ctgggcggcg ataattcacc caccgataag 12 0 

atgaacgtga aaggcgatac ccaagggaac actcgcgttc gggttgataa cattggcggc 18 0 

gtcggtgcgc aaacggtcaa cggtattgaa ctcattgagg ttggcggtaa ttctgcaggt 240 

60 aatttcgcgc tgaccaccgg aactgtcgaa gctggggctt acgtctacac gctggctaaa 300 

gggaagggga atgacgagaa aaactggtat ctgaccagta aatgggacgg cgtaacgcca 360 

gcggatacac ccgatcccat caataatccc cctgttgtgg atccggaagg cccatcagtt 420 

tatcgcccgg aggccggaag ctatatcagc aacattgccg cagccaactc gctgtttagc 48 0 

catcgcttac acgaccgtct gggtgagccg caatatacag attcactgca ttctcaggat 540 

65 tcagcaagca gtatgtggat gcgtcatgtc ggggggcacg aacgttccag tgccggagac 600 

ggccagctaa atactcaggc taaccgctat gtattgcagc taggcggcga tttggcgcag 660 

tggagtagca acgcgcagga tcgctggcat cttggcgtga tggcaggcta cgccaatcag 720 
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cacagtaata ctcagagtaa tcgtgtgggt tataaatcgg atgggcgcat cagcggttac 780 

agcgctgggc tgtacgcgac ctggtatcag aacgatgcga ataagaccgg cgcttatgtt 840 

gacagctggg cgctgtataa ctggtttgat aacagcgtca gttccgataa ccgttctgct 900 

gacgactatg attctcgcgg tgtgacggcc tctgttgagg gtgggtatac ctttgaagcg 960 

5 ggaacatgta gcggcagcga agggacgctg aatacctggt acgtccagcc acaggcgcaa 1020 

atcacctgga tgggtgtgaa agattctgac catgcccgga aagacggaac gcgcattgaa 10 8 0 

acggaaggcg acggaaacgt gcaaacgcga cttggggtga aaacctacct gaatagccat 1140 

caccagcgtg acgatggtaa acagcgtgag ttccagcctt acattgaagc gaactggatc 12 00 

aacaatagca aagtctacgc cgtgaagatg aatggtcaaa ccgtaagccg tgatggtgcg 1260 

10 cgaaatctcg gtgaagtacg taccggggtt gaggcgaaag taaataacaa ccttagcctg 13 20 

tgggggaatg tcggtgtgca actaggtgat aaaggctata gcgatactca gggcatgctg 13 8 0 

ggagtgaaat atagctggta a 14 01 
<212> Type : DNA 
<211> Length ; 1401 

15 SequenceName : SEQ ID 406 

SeguenceDe script ion : 

Sequence 



20 <213> OrganismName : Escherichia coli 0157:H7 
<4 00> PreSequenceString : 

atgtcatatc tgaatttaag actttaccag cgaaacacac aatgcttgca tattcgtaag 60 

catcgtttgg ctggtttttt tgtccggctc tttgtcgcct gtgcttttgc cgtacaggca 120 

cctttgtcat ctgccgaact ctattttaat ccgcgctttt tagcggatga tccccaggct 18 0 

25 gtggccgatt tatcgcgttt tgaaaatggg caagaattac cgccagggac gtatcgcgtc 240 

gatatctatt tgaataatgg ttatatggca acgcgtgatg tcacatttaa tacgggcgac 3 00 

agtgaacaag ggattgttcc ctgcctgaca cgcgcgcaac tcgccagtat ggggctgaat 360 

acggcttctg tcgccggtat gaatctgctg gcggatgatg cctgtgtgcc attaaccaca 420 

atggtccagg acgctactgc gcatttagat gttggtcagc agcgactgaa cctgacgatc 480 

30 cctcaggcat ttatgagtaa tcgcgcgcgt ggttatattc ctcctgagtt atgggatccc 540 

ggtattaatg ccggattgct caattataat ttcagcggaa atagtgtaca gaatcggatt 600 

gggggtaaca gccafctatgc atatttaaac ctacagagtg ggttaaatat tggtgcgtgg 660 

cgtttacgcg acaataccac ctggagttat aacagtagcg acagatcatc aggtagcaaa 720 

aataaatggc agcatatcaa tacctggctt gagcgagaca taataccgtt acgttcccgg 780 

35 ctgacgctgg gtgatggtta tactcagggt gatattttcg atggtattaa ctttcgcggc 840 

gcacaattgg cctcagatga caatatgtta cccgatagcc aaagaggatt tgccccggtg 900 

atccacggta ttgctcgtgg tactgcacag gtcactatta aacaaaatgg gtatgacatt 960 

tataatagta cggtgccgcc ggggcctttt accatcaacg atatctatgc cgcaggtaat 102 0 

agtggtgact tgcaggtaac gattaaagag gctgacggca gcacgcagat ttttaccgta 108 0 

40 ccctattcgt cagtcccgct fcttgcaacgt gaagggcata ctcgttattc cattacggca 1140 

ggagaatacc gtagtggaaa tgcgcaacag gaaaaacccc gctttttcca aagtacatta 120 0 

ctccacggcc ttccagctgg ctggacaata tatggtggaa cgcaactggc agatcgttat 1260 

cgtgctttta attttggtat cgggaaaaat atgggggcac tgggcgctct gtctgtggat 132 0 

atgactcagg ctaattccac acttcccgat gacagtcagc atgacggaca atcggtgcgt 1380 

45 tttctctata acaaatcgct caatgagtca ggcacgaata ttcagttagt gggttaccgt 1440 

tattcgacca gcggatattt taatttcgct gatacaacat acagtcgaat gaatggctac 1500 
aacatcgaaa cacaggacgg agttattcag gttaagccga aattcaccga ctattacaac • 1560 

ctcgcttata acaaacgcgg gaaattacaa ctcaccgtta ctcagcaact cgggcgctca 162 0 

tcaacactgt atttgagtgg tagccatcaa acttattggg gaacgagtaa tgtcgatgag 1680 

50 caattccagg ctggattaaa tactgcgttc gaagatatca actggacgct cagctatagc 1740 

ctgacgaaaa acgcctggca aaaaggacgt gatcagatgt tagcgcgtaa cgtcaatatt 1800 

cctttcagcc actggctgcg ttctgacagt aaatctcagt ggcgacatgc cagtgccagc 1860 

tacagcatgt cacacgatct caacggtcgg atgaccaatc tggctggtgt atacggtacg 1920 

ttgctggaag acaacaacct cagctatagc gtgcaaaccg gctatgccgg gggaggcgat 1980 

55 ggtaatagcg gaagcacagg ctacgccacg ctgaattatc gcggtggtta cggcaatgcc 2040 

aatatcggtt acagccatag cgatgatatt aagcagctct attacggagt cagcggtggg 2100 

gtactggctc atgccaatgg cgtaacgctg gggcagccgt taaacgatac ggtggtgctt 2160 

gttaaagcgc ctggcgcaaa agatgcaaaa gtcgaaaacc agacgggggt gcgtaccgac 2220 

tggcgcggtt atgccgtgct gccttatgcc actgaatatc gggaaaatag agtggcgctg 2280 

60 gataccaata ccctggctga taacgtcgat ttagataacg cggtcgctaa cgttgttccc 2340 

actcgtgggg cgatcgtgcg agcagagttt aaagcgcgcg ttgggataaa actgctcatg 24 0 0 

acgctaaccc acaataataa gccgctgccg tttggggcga tggtgacatc agagagtagc 2460 

cagagtagcg gcattgttgc ggataatggt caggtttacc tcagcggaat gcctctagcg 252 0 

ggaaaagttc aggtgaaatg gggagaagag gaaaatgctc attgtgtcgc caattatcaa 2580 

65 ctgccaccag agagtcagca gcagttatta acccagctat cagctgaatg tcgttaa 2637 



<212> Type : DNA 
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<2ll> Length : 263 7 

SequenceName : SEQ ID 407 
SequenceDescription : 

5 Sequence 



<213> OrganisraName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

atgcagataa tctttggaga aaaatgcgtg tcattactac gactattttt tgccgccgtc 60 

10 ttaatgctat ggtgcgctca aaccgctgct tatagcgggc agtgtcatac cactcagggg 12 0 

aatccgtata ttggcgtcaa ttttggcgtt aaaaccctgg aggaagaaga aaatacgact 180 

9999Jtagtaa aagacaaatt ttatcagtgg aacgaatcga atgattatta tgtttcctgt 24 0 

gattgcgata aagacaatgt cagaagtggc cgatgggcat tcgccgcgga ttcaccgtta 3 00 

gtctatttag gcgacaactg gtacaaaatt aatgactatc ttgccgccaa .agttttattg 360 

15 caggttaaag gcagttctcc tacagcggtt cctttcgaaa acgtggggac tggggcagat 420 

acccggtggc atatttgtga ccccggcggt caacgtttag gcggccaggg agctagcggt 480 

aatagcggta gcttttccct gaaaatattg cagccgttcg ttggttcggt cgtcattcct 540 

cctatggcgc tggcgcgatt atttgaatgc tacaacatac ccgcaggtga ttcctgcacg 600 

actacaggca caccggtttt agtgtattac ctgtctggta ctatcaattc acttggctca 660 

20 tgttccgtca atgccggaga aacaatcgag gtcgatctgg gcgacgtatt tgcggctaac 720 

tttcgtgttg tagggcataa gcctcttggg gccagaacgg cagaacttgc aattccagtc 780 

aggtgtaaca cgggaaacgc ggggttagtt aacgtcaacc tgagtctgac ggcaaccaca 840 

gaccccagct atccccaggc gattaagacg tcacgtcctg gcgtgggcgt ggfcggtgacc 90 0 

gatagccaga acaacattat ttcccctgct ggtggaacat taccgctctc tattcctgat 960 

25 gatgcagaca gtatcgcgtg a 981 



<212> Type : DNA 

<211> Length : 981 

SequenceName : SEQ ID 408 
SequenceDescription : 

30 

Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<4 0 0 > PreSequenceS tr ing : 

35 atgaaaatta aaactctggc aatcgttgtt ctgtcggctc tgtccctcag ttctacagcg 60 

gctctggccg ctgccacgac ggttaatggt gggaccgttc actttaaagg ggaagttgtt 120 

aacgccgctt gcgcagttga tgcaggctct gttgatcaaa ccgttcagtt aggacaggtt 180 

cgtaccgcat cgctggcaca ggacggagca accagttctg ctgtcggttt taacattcag 240 

ctgaatgatt gcgataccaa tgttgcatct aaagccgctg ttgccttttt aggtacggtg 3 00 

40 attgatgcgg gtcataccaa cgttctggct ctgcagagtt cagctgcggg tagcgcaaca 360 

aacgttggtg tgcagatcct ggacagaacg ggtgctgcgc tgacgctgga tggtgcgaca 42 0 

ttcagtgagc aaacaaccct gaataacggt actaacacca ttccgttcca ggcgcgttat 48 0 

tatgcaatcg gcgaggcaac cccgggtgct gctaatgcgg atgcgacctt caaggttcag 54 0 

tatcaataa 549 

45 <212> Type .- DNA 

<211> Length : 549 

SequenceName : SEQ ID 409 
SequenceDescription : 

50 Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

atgaaattaa aagtcatcgc tacactgatt gctactgttg ccgtgggtgt aagctttaac 60 

55 agcaattttg cttctgcgag tacaacgtcc gcttctttaa ccgtaaacag taacctgact 120 

atgggtacct gcagtgctca gataatggat aatagtaata aagtgatcaa tgaagtggtc 180 

tttggcaatg tttatatttc tgaactcggt gcaaaaagca aagtgcaaca gtttaaaatt 240 

cgctttagca attgctctgg ccttccccaa aacagcgccc aaatagtgct ggcacctaat 3 00 

ggtatatcct gtgctggttc tcaatcgtca tcggcgggtt tttctaacaa gtttactgac 3 60 

60 gctagcgcag caaccagaac ggctgtggaa gtatggacta cagatacacc ggaaagcaat 42 0 

ggcagtacgc aattccattg tgctcaaaag ataccagtgc ctgtgacgct tcccgccgac 48 0 

accacaactc agccttacga ttacccgtta agtgcacgga tgaccgttgc ggaaggtaga 540 

ttggtaaccg atgtaagacc gggtaatttc cgctctccca cgactttcac gatcacttat 600 

cagtaa 60 6 
65 <212> Type : DNA 

<211> Length : 606 

SequenceName : SEQ ID 410 
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SequenceDescription : 
Sequence 



5 <213> OrganismName : Escherichia coli 0157:H7 
<4 00> PreSequenc eSt ring : 

ttggcatcaa cagttgagta tggtgagaca gttgatggtg ttgtcctgga aaaagatatc 60 

cagctggttt atgggaccgc caataatacg aaaatcaatc ctggcggaga acagcatatt 120 

aaagaatttg gtataagtag taatactgaa attaacggcg ggtatcagta cattgaaatg 18 0 

10 aatggcaccg cagaatactc agtattaaat gatggttatc aaattgttca aatgggtggc 240 

gcggcaaacc agactacgct caataatggt gtgctacagg tttatggcgc agcgaatgat 3 00 

cccacgatta aaggcgggcg cttaatcgtt gaaaaagatg ggattaccgt ccttgccgct 360 

atcgaaaagg gaggattact ggaggttaaa gaggggggat tagcgattgc ggtagatcag 420 

aaagcaggcg gtgctattaa agcaagcacg cgggtcatgg aggtattcgg aacaaaccgt 480 

15 ctcggtcagt tcgaaatcaa gaatggtatt gctaacaata tgctgttgga aaacggcgga 540 

agtttgcgag ttgaagaaaa tgacttcgct tataatacta ctgtagatag tggcggctta 600 

ctggaggtta tggatggcgg gactgcaact ggcgttgata aaaaagcagg cggaaaatta 660 

attgtctcaa cgaatgcgct ggaagtgagt ggtacaaaca gtaaaggcca atttagtata 72 0 

aaagatggtg tgtcaaaaaa ttatgaactg gatgatggtt ccgggcttat tgttatggag 780 

20 gacacgcagg ccattgacac tatcctcgat gagcatgcca ctatgcaatc gctgggaaag 840 

gatactggta cgagagtgca ggcaaatgcg gtatatgatc tcggtcgatc agatcagaat 90 0 

ggaagtataa cgtattcctc taaagccatc tctgaaaata tggttatcaa caatggccgc 960 

gctaacgtct gggctggcac aatggttaac gtgtcagtca gaggaaatga tggcattctt 102 0 

gaggttatga agccgcaaat aaattatgca cccgcaatgt tggtgggtaa ggtagtggtt 1080 

25 tctgagggcg cttctttaag aacgcatggt gccgtggata ccagcaaagc ggatgtttcg 1140 

ctcgaaaata gcgcatggac catcattgcc gatatcacta cgacgaacca aaacacccgc 12 0 0 

cttaacttag ccaaccttgc gatgtctggc gcaaatgtga ttatgatgga tgagtcagtg 12 60 

actcgttcat ctgtgacggc aagtgcggaa aatttcacta cgttgaccac caataccctg 13 20 

tcgggaaacg gcaattttta tatgcgtacc gatatggcga atcatcagag cgatcagctc 13 80 

30 aacgtcaccg gtcaggcaac aggtgatttc aaaatattcg tgacggacac cggtgccagc 1440 

ccggcagcag gagatagcct tacactggta acaacgggcg gcggtgatgc tgcatttacg 1500 

ttgggcaatg ccggaggcgt tgttgatatc ggtacgtatg aatatacctt gctggataat 1560 

ggtaaccata gctggagtct ggcagagaat cgcgcgcaaa ttaccccttc aaccactgat 1620 

gtgctgaata tggcggccgc acaaccgctg gtatttgatg cagaactgga caccgtgcgt 168 0 

35 gagcgtcttg gtagcgtaaa aggcgttagt tacgatacgg cgatgtggag ttcggcaatt 1740 

aacacccgca acaacgtgac cactgatgcg ggagctggtt ttgagcaaac attgacgggc 18 00 

ctgacgctcg gtatcgatag ccgtttctcc cgtgaagaaa gcagcacaat tcgcggcttg 18 60 

ttctttggtt actctcatt'c tgatattggt tttgatcgcg gcggcaaagg caatgtcgat 1920 

agctataccc tgggggctta tgccggttgg gagcatcaga acggtgccta tgttgatgga 1980 

40 gtggtgaaag ttgaccgttt tgccaacacc atccatggca agatgagtaa tggggcaaca 2040 

gcgtttggcg attacaatag taacggcgcg ggtgctcatg tcgagagcgg gttccgttgg 2100 

gttgacggat tgtggagtgt tagaccctat ctggccttta ccggctttac cacagatggt 2160 

caggactaca cgttatcaaa cggcatgcgc gcggatgtgg gaaatacccg gatattacgc 2220 

gctgaagcgg gaacggcggt aagctatcac atggacctgc aaaacggtac gacgctggaa 22 80 

45 ccctggctga aagccgccgt gcgtcaggaa tacgccgatt ctaaccaggt gaaagttaat 2340 

gacgatggca aatttaataa tgatgtggct ggaacccgtg gcgtttatca ggctgggata 2400 

aggtcatcgt ttaccccgac gttaagcggt catttgtcag tcagctatgg caatggcgca 2460 

ggggtagaat cgccgtggaa tacccaggcg ggtgtggtct ggacgttctg a 2511 

50 <212> Type : DNA 

<211> Length : 2511 

SequenceName : SEQ ID 411 
SequenceDescription : 

55 Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<40 0> PreSequenceString : 

atgcaaagga aaggcaataa actgttgatt cagttatgca gtgtgatact gctatttttt 60 

60 accacatcct ggtatgcatt ggcgaatgaa tgttatatag agagaaatgc tgaaggggat 120 

tatcacatga agataagctc tactcagctt agtctggcgt cacaaatggt cgaggttccg 180 

acagaaatag ccgaagctac atgggatgta aatattcaac taagaggcga tgccataggg 240 

tgtaaatctc ttggggatag taaggcagtt cactttctta atacagctga cccaagttta 3 00 

atatccacgt acaccacaac gaatggcgca gcgttattaa aaacaactgt tccaggcatt 3 60 

65 gtgtattctg tcgagttatt atgccttagt tgtggtgccg cagatgaact tgatttatgg 420 

ctacctgcac aaagtggcgc agataacttc ataccaagca cccagacgaa atgggcctat 480 

gagtacagtg atcaaagttg gtatttacgt tttcgcttat tcataactcc tgaatttaaa 540 
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cccaagaatg gtgtttccag cggaacaacg atagcaggaa agattgcgtc atggtatata 600 

. ggtaccaatg accagccgtg gatcaacttt tacattgaca atgactcttt aaagtttttc 660 

gtcgatgaac cgacctgtgc aacagttgcc ctggcacaag atcagggcaa cgtcagtggc 72 0 

aatcaggtaa cgcttgggaa cagctatgtt tcggaagtga aaaatgggct tacgcgggaa 78 0 

5 atcccttttt ctatccgtgc tgaatactgt tatgccagta aaattacggt taagttgaaa 840 

gcggcaaata aacccagcga tgccacactg gtgggtaaaa cgactggctc ggcttcaggc 9 00 

gtggctgtaa aagtaaattc aacttatgac aatagcaaag tattgttaaa agcagatggt 960 

agcaacacgg ttgactacaa cttcgccgcc tggtcaaaca acctgctgtt tttacctttt 1020 

acggcgcagc tggtaccgga tggtagcggt aatgctgtcg gtgttggaac attttcaggt 1080 

10 aacgcgacct tctcctttac ctacgaataa 1110 



<212> Type : DNA 

<211> Length : 1110 

SequenceName : SEQ ID 412 
SequenceDescription : 

15 

Sequence 



<213> OrganistnName : Escherichia coli 0157 :H7 
<400> PreSequenceString : 

20 ttgtaccagt ttactcatca aaaaagccgt atcccgaaaa aaacgctact tgcggcctgt 60 

tgtgccctgt tttatagcag caacggtgct gcggcggaca ccgtggaata tgacagttcc 120 

tttttaatgg gaactggcgc atcaacgatt gatgttaaac gttatgctca aggcaacccg 18 0 

acaccgccgg gtctctataa tgtccgcgta tttgtaaacg gtcaggcgac ttccagctta 24 0 

gaaattccgt ttgtggatat tggcgaaaac agtgcggcgg cctgtcttac ccataaaaac 3 00 

25 ctggcgcaac ttcacattaa gcaacctgaa cagcctgtca ctttactcgc cagagaaggt 3 60 

gaagaagagg attgtctgga tctggcaaag tcatacgaaa aggcggatgt gtgctttgac 420 

ggtagtgacc agtttctcga tctgacgatc cctcaggcct atgttctgaa aagctatggc 480 

ggctacgttg acccttcttt atgggaatcg ggaattaacg ctgccacact ggcatatacc 540 

ctgaacgcgt atcacacaag ttcagataac gacaatagtg acagcgtcta tggcgcgttc 600 

30 aactcaggta tcaatttagg agcctggcac tttcgtgcgc gcggtaacta taactggaca 660 

acagataacg gcagcgattt cgatttccag gatcgttact tacagcgtga cattccggca 72 0 

atccgttccc agataattat gggtgatgcc tataccaccg gtgaaacgtt tgactctgtc 780 

aacgtccgtg gtgttcgcct gtacagcgac agccgtatgc tgccttcggc gctggccagt 84 0 

tacgctccga ccatccgcgg tgtagcaaac tccaacgcca aagtcaccgt gacgcaaagc 9 00 

35 ggatataaaa tttatgaaac caccgttccg cccggtgaat ttgttataga cgacattagc 960 

ccttccggct ttggtagcga actggtcgtg accattgaag aagcggatgg ttccaaacgc 1020 

acctttacgc aacccttctc gtcggttgta caaatgcaac gtcctggtgt gggccgttgg 108 0 

gatttcagcg cgggtaaagt cattgatgac agtctgcgat ccgaacccaa tatggggcaa 1140 

gcctcttatt actatggtct gaataacctc ttcacgggtt ataccggcat tcagttcacc 12 00 

40 gataataact atcttgccgg gctgttaggt gtgggtatca acaccagcat cggcgccttt 1260 

gcggtagacg ttacccattc ccgtgctgaa attccggatg ataaaaccta ccaggggcaa 1320 

agttatcgcg tgacctggaa caaacttttc caggataccg ggacatcatt taacctcgcg 13 80 

gcgtaccgct attccaccca ggattacctg ggcctgcatg atgcgttagt cctcattgac 1440 

gacgccaagc atttgtctgc cgatgaagac aaaaacacca tgcagacgta ctcacgtatg 1500 

45 aaaaaccagt ttaccgtcag cattaaccag ccattgaata tcgcctatga agattacggt 1560 

tcgctgttta tttccggtag ctggacgtat tactgggcgg cgaacaatag ccgcactgaa 1620 

tataatgttg gttacagtaa aagcgtttcg tggggcagtt tcagcgtcaa cctacaacgt 1680 

agctggaatg aagacggcga gaaagatgac gcgatgtacg tcagcgttag cgtacctatt 1740 

gagaatattt taggtggcaa acgtaagtct tctggtttcc gcaatttaaa tactcagctc 1800 

50 aataccgatt tcgatggttc acatcagttg aatgttaaca gttccggtaa cactgaaaac 1860 

aatctggtga actacagtgt caacgcaggt tatagcctcg ataaaaacgc cggcgattta 1920 

gcctctgttg gtggttatct caactatgaa tctgggttag gcggtatttc cgcttcggcc 1980 

tcggccactt ctgataacag ccaacagtac tccatctcaa ccgatggcgg ctttgtatta 2 040 

cacagtggtg gtttaacgtt cactaacaac agtttcagca gtaacgacac gctggtgtta 2100 

55 atcaacgccc taggtgctaa aggcgcacga atcaataaca gtaataacga aatcgatcgc 2160 

tggggatatg ccgtgacgtc ctctgtcagc ccatatcgtg aaaaccgggt aggtctgaac 2220 

attgaaacac tggaaaacga tgttgaactg aaaagtacca gcgccaccac cgtaccacgt 22 80 

agcggctccg ttgttttgac ccgtttcgaa actgacgagg ggcgttctgc cgtgctgaat 2340 

attactgccg ccaatggcaa atccattccg tttgctgcgg aggtttacca gggtgaggtg 2400 

60 atgatcggca gcatgggcca gggtggtcag gcatttgtac gcggtattaa cgacagcggg 2460 

gaattaatcg tgcgctggta tgaaaacaac caaaccattg actgtaagtt gcactaccag 2 52 0 

ttcccggcgc agccacaaac gcagggaagc accaacacct tattacttaa caatcttacc 2580 

tgtcaggtag caaatcacta a 2601 
<212> Type : DNA 

65 <211> Length : 2601 

SequenceName : SEQ ID 413 
SequenceDescription : 
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Sequence 

<213> OrganismName : Escherichia coli 0157:H7 
<4 00> PreSequenceString : 

atgaagttca aacgattgct gcatagcggc atcgccagtt tgagtctggt tgcctgcggg 60 

gtgaatgcgg cgacggatct tggcccggca ggggatattc atttctccat cactatcacc 120 

actaaagctt gcgagatgga aaaaagcgat ctcgaagtcg atatgggaac aatgacgctg 180 

caaaaacctg cggcagtcgg tacggtgttg agcaagaaag atttcaccat tgaactcaaa 24 0 

gagtgcgatg ggatatccaa agcgaccgtt gagatggaca gtcagtcgga cagcgatgat 3 00 

gattccatgt ttgcccttga ggctggtggc gcaacgggtg ttgcgttgaa gatagaggac 360 

gataaaggaa cgcagcaagt tcccaaaggc tccagcggaa cgccgattga atgggcgatt 420 

gatggcgaaa ccacgtcgct tcactaccag gcgagttatg tggtcgtcaa cactcaggcc 4 80 

actggtggca cagcgaatgc ccttgtaaat ttttccatca cctatgagta a 531 

<212> Type : DNA 
<211> Length : 531 

SequenceName : SEQ ID 414 

SequenceDescription : 



Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<4 00> PreSequenceString : 

atgaaataca ataacattat tttcctcggt ttatgtctgg ggttaaccac ctattctgct 60 

ttatccgcag atagcgttat taaaattagc gggcgcgtcc tcgattatgg ctgcacagtc 120 

tcatcggatt cgcttaattt taccgtagat ctccaaaaaa acagtgccag acaatttcca 180 

acgaccggta gcacaagtcc agccgtccct tttcagatta cgttaagtga atgcagcaaa 240 

gggacaacgg gggttcgggt tgcatttaac ggtattgagg acgcagaaaa taatactctg 3 00 

ttgaaactgg atgagggaag caatacggcc tccggtttag gtatagaaat actggacgga 3 60 

aatatgcgtc cggtgaaact gaatgacctt catgccggga tgcagtggat cccactggta 42 0 

ccagaacaga acaatatttt gccttactcc gctcgtctga agtcaactca gaagtccgtc 480 

aatccgggac tggtgagggc ttcggcaacc tttacccttg aatttcaata a 531 



<212> Type : DNA 

<211> Length : 531 

SequenceName : SEQ ID 415 
SequenceDescription : 



Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

atgaaatggc gcaaacgtgg gtatttattg gcggcaatat tggcgctcgc aagtgcgacg 60 

atacaggcag ccgatgtcac catcacggtg aacggtaagg tcgtcgccaa accgtgcaca 120 

gtttccacca ccaatgccac ggttgatctc ggcgatcttt attctttcag tctgatgtct 180 

gccggggcgg catcggcctg gcatgatgtt gcgcttgagt tgactaattg tccggtggga 240 

acgtcaaggg tcactgccag cttcagcggg gcagccgaca gtaccggata ttataaaaac 3 00 

caggggaccg cgcaaaacat ccagttagag ctacaggatg acagtggcaa cacattgaat 3 60 

actggcgcaa ccaaaacagt tcaggtggat gattcctcac aatcagcgca cttcccgtta 420 

caggtcagag cattgacggt aaatggcgga gccactcagg gaaccattca ggcagtgatt 4 80 

agcatcacct atacctacag ctga 504 
<212> Type : DNA 
<211> Length : 504 

SequenceName : SEQ ID 416 

SequenceDescription : 



Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

atgaaaagag cgcctcttat aacaggactt ttgttgatat ccacatcctg cgcttatgcc 60 

tcctcagaag ggtgtggagc tgacagcact agcggtgcga caaattacag cagtgtggtt 120 

gatgatgtta cggtgaacca gacagataac gtgacaggac gggagtttac ctctgcaacg 180 

ctaagtagca ctaactggca atacgcctgt tcctgctctg cgggtaaggc agttaaactt 240 

gtctatatgg tcagccccgt acttaccacc actggacatc agacaggata ttacaaactc 3 00 

aatgacagcc tggatattaa aaccatgaac cgccccggaa atcctggaga ctaa 3 54 
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<212> Type : DNA 
<211> Length : 354 

SequenceName : SEQ ID 417 
5 SequenceDescriptiorx : 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 

10 <400> 'PreSequenceString : 

atgaaaaaag cacttctcgc agccgctctg gttatggctt ctggttccgc cctggctgta 60 

gatggtggtc atatcgactt taacggtatg gtacagtccg gtacctgtaa agtgggtgtg 120 

gtagatactg gtatgcatag cgttaccact gatggcgtgg ttaccctgga tactgcgaat 180 

gttactgata cttttgctga agttagcgca actgctgtcg gtttactgcc gaaagagttc 24 0 

15 atgatttctg fctgagtgtga tccaggtgct ccgaagaatg ctgagttaac tatgggttct 3 00 

gcaagttacg cgaacaccag cggtaccctg aataacaata tgaacatcac tgttaacggt 3 60 

attgcaccgg ctcagaacgt aaacattgca gttcataaca tgaaaaacaa agctggcgct 42 0 

gctgaaatta agcaggtcca tatgaacaac tcttctgaag ttcaggaact gacattagac 4 80 

gcagaaggta aaggccagta cgtafcttaac gcatcttacg ttaaagcacc gaacagcccg 540 

20 gctgtaactg ctggtcatgt aaccactaac gcgctgtaca ccgttgctta taagtaa 597 



<212> Type : DNA 
<211> Length : 597 

SequenceName : SEQ ID 418 
25 SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 

30 <40 0> PreSequenceString : 

atgaaaccaa atatgattgt aggagcatta gcgttaactt ctgtgtttat ggcaggtcac 60 

ctacaggcgg ctgatggaac agtccatttc cgtggtgaaa ttattgacag tacttgcgaa 12 0 

gtcactcctg aaactaaaga tcaggtcgtt gatttaggca aagtaaaccg tacagccttt 180 

agtggcgtcg atgatgtggc tgccccgacg gctttttcta tcgatctgac tcaatgcccg 240 

35 gaaaccttta agtccgccgc aattcgtttc gatggtaatg aagatgctca tggtaatggc 3 00 

aacctggcaa ttggtacccc gctggataac tctaacgatg ctgccgctgg tattagcccg 3 60 

agtgataaca gtggggatta tactggtgcg ggtgccgtta gtgcagcgaa aggcgtagct 420 

attcgtttat ataaccgtgc agataacact caggtcaagt tatatgaaaa ttctgcatca 480 

actccgattt ctaatggtaa tgcatccatg aagttcatgg ctcgttatat tgctacggaa 540 

40 acgactattg accctggtac agctaacgcc gactcgcagt ttacagttga atatataaaa 600 

taa 603 
<212> Type : DNA 
<211> Length : 603 

SequenceName : SEQ ID 419 



45 SequenceDescription : 

Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 

50 <400> PreSequenceString : 

gtgccaattt tccagcgtga aggccatctc aaatatagct ttgccgcagg tgaatatcag 60 

gccgggaatt atgacagcgc ctcgccgcgt ttcgggcagc ttgatctgat ctacggttta 120 

ccgtggggga tgacggccta cggcggcgta ttaatctcta ataattacaa tgcatttaca 180 

ttagggatag ggaaaaactt tggttatatc ggggcgattt ccattgatgt gacgcaggct 240 

55 aaaagcgaac tgaataacga tcgcgatagc cagggacaat cttatcgttt cttatattcc 3 00 

aagagcttcg aaagcggcac cgatttccgc cttgcgggct atcggtactc taccagcggt 3 60 

ttctatacct tccaggaagc caccgatgtg cgcagtgacg ctgacagcga ctataaccgt 42 0 

tatcacaagc gcagcgaaat acagggtaac ctgacgcagc aattaggggc ctatggctct 4 80 

gtttatttaa atttaacgca gcaggattac tggaacgacg caggtaaaca gaacacggta 540 

60 tcggcgggtt acaacggacg tattggcaag gtcagttaca gtattgcata tagctggaat 600 

aaaagccctg aatgggatga aagcgatcgc ttgtggtctt tcaatatttc cgttccacta 660 

ggccgggcct ggagtaacta tcgcgtcacg accgaccagg atggtcgtac caatcaacag 72 0 

gttggggtca gcggaacgct gcttgaggat cgcaacctga gctacagtgt ccaggaaggc 78 0 

tacgccagca acggtgtggg taacagcggt aacgctaacg ttggctatca gggtgggtcc 840 

65 ggtaatgtca acgtaggcta tagctacggg aaagattacc ggcagctcaa ctacagcgtt 900 

cgcggcggcg tgatagttca tagcgaaggc gtgacgcttt cccaaccgct aggcgaaacc 960 

atgacgctca tctccgtacc cggtgcgcgc aatgcccgcg tggtgaataa cggcggcgtt 102 0 
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caggttgact ggatgggtaa cgcgatcgtg ccttatgcca tgccgtatcg tgaaaacgaa 108 0 

atctcactgc gtagcgattc gttgggtgac gatgttgacg ttgaaaatgc gttccagaaa 1140 

gtggtgccaa cgcgtggagc gattgtcaga gcgcgttttg atacccgcgt tggttaccgc 1200 

gtattaatga cgctgcttcg ttccgcgggc agcccggtgc cctttggagc aacggcaacg 12 60 

5 ctaatcaccg ataaacaaaa cgaggtgagc agtatcgttg gtgaagaagg acagctctat 1320 

attagcggaa tgccagagga aggacgggta ttgattaaat ggggtaatga cgcgtcgcag 13 80 

caatgcgtgg cgccttataa attatccctg gaattaaaac agggcggaat tattcctgtt 1440 

tcggccaatt gccagtaa 1458 
<212> Type : DNA 
10 <211> Length : 1458 

SequenceNarae : SEQ ID 42 0 

SequenceDescription : 



15 



Sequence 



<213> OrganismName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

atgagtggtt acaccgtcaa gcctcctacc ggagacagca atgagcagac acaatttatt 60 

gattatttta atctgttcta cagtaagcgt gatcaggaac aaataagcat ctctcagcag 120 

20 cttggaaatt acggtgcgac atttttcagt gccagtcgcc aaagttactg gaacacgtca 180 

cgcagcgacc agcaaatatc atttggatta aatgtgccgt ttggtgatat tacgacttcg 240 

ctgaattaca gctattccaa taatatatgg caaaacgatc gggatcattt actcgctttt 3 00 

acgcttaatg ttcccttcag tcattggatg cgtacagaca gtcagtcggc atttcgtaat 3 60 

tcaaacgcca gttacagtat gtcaaacgat ttgaaaggcg gcatgaccaa tctatcgggg 420 

25 gtttatggca ctctgctgcc ggataataac ctgaattata gcgttcaggt cggtaacacc 480 

cacggaggta atacatcgtc tggcaccagt ggttacagta ctcttaatta tcgtggagct 540 

tacggcaata ctaatgtcgg ttacagtcgg agtggtgaca gcagccagat ttattacgga 60 0 

atgagtggtg ggattattgc tcatgctgat ggcatcacct ttggacagcc gctgggcgac 660 

acaatggttc tggttaaggc tcctggcgct gataatgtca aaatagagaa ccagaccgga 72 0 

30 attcataccg actggcgtgg ctatgccata ttaccatttg cgacagaata tagagaaaat 780 

cgtgtcgctc ttaacgcgaa ttcccttgca gataatgttg aactggatga aaccgtggtc 840 

actgtcatcc caactcacgg tgctattgcc agagcaacat ttaatgcaca aatcggcggg 90 0 

aaagtattaa tgacgttgaa gtacggtaat aaaagcgttc cattcggtgc aattgtcact 960 

cacggagaga ataaaaatgg cagcattgtc gcggaaaacg gtcaggttta tctgactgga 1020 

35 cttccacagt cagggaaatt acaggtttca tggggcaatg ataaaaactc aaactgtatt 1080 

gtcgattaca agcttcctga agtctctcct ggaaccttgc tgaaccagca gacagcaatc 1140 
tgtcgctaa 

<212> Type : DMA 
<211> Length : 1149 
40 SequenceName r SEQ ID 421 

SequenceDescription : 

Sequence 



1149 



45 <213> OrganismName : Escherichia coli 0157:H7 
<4 00> PreSequenceString : 

atgtctgctt tgtatgaacg ttcacagctg acgcaggtga tgatttcatc tgccccggcg 60 

actgctgaaa ccatggagaa ggcggaatat ctgcgcctgg actgcaccat caaggaagtc 120 

cagttcaccg ccggtcagaa acaggatatt gatgtgacca cgctctgctc cacagagcag 180 

50 gagaacatca acggtctggg ggcgtcgtcc gagatttcca tgtcgggtaa tttttatctg 240 

aatcaggccc agaacgccct gcgtgatgcc tatgacaatg acacggtgta tgcgtttaag 3 00 

gtgcagtttc cgtccggtaa gggctttaag ttcctggcgg aagtgcgtca gcacacctgg 3 60 

tcatccggta ccaacggcgt ggtggctgca acgttttcac ttcgcctgaa gggtaaaccg 420 

gtgtcctatg tggtaccgct ggcgtttgtg aaaaatctgg ataagacact taccgtgaat 4 80 

55 accggtgcgc tgctgacaat gtcagtcagt gtcaacgggg gaacgccgcc ttataaacac 540 

gcctggaaga aggatggtca gccggtagag ggacagacta ctgacacttt cagtaagcca 600 

ggtgcgcagt caggtgataa gggggcttat acctgcgagg taacggattc tgcagaacag 660 

ccgcagagca ttacctctga tgcgtgtaca gtaacggtta atggtgcggg cggataa 717 

60 <212> Type : DNA 

<211> Length : 717 

SequenceName : SEQ ID 422 
SequenceDescription : 

65 Sequence 



<213> OrganismName : Escherichia coli 0157 :H7 
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<4 00> PreSequenceString : 

atgagaaaca aaccttttta tcttctgtgc gcttttttgt ggctggcagt gagtcacgct 60 

ttggctgcgg atagcacgat tactatccgc ggctatgtca gagataacgg ctgtagtgtg 120 

gccgctgaat caaccaattt tactgttgat ctgatggaaa acgcggcgaa gcaatttaac 180 

5 aacattggcg cgacgactcc tgtcgttcca tttcgtattt tgctgtcacc ctgtggtaac 240 

gccgtttctg ccgtaaaagt tgggtttacc ggcgttgcag atagccacaa tgccaacctg 30 0 

cttgcacttg aaaatacggt gtcagcggct tcgggactgg gaatacagct tctgaatgag 3 60 

cagcaaaatc agatacccct taatgctcca tcgtccgcga tttcgtggac gaccctgacg 42 0 

ccgggtaaac caaatacgtt gaatttttac gcccggctaa tggcgacaca ggtgcctgtc 48 0 

10 actgcggggc atatcaatgc cacggctacc ttcactcttg aatatcagta a 531 

<212> Type : DNA 
<211> Length : 531 

SequenceName ; SEQ ID 423 
15 SequenceDescription : 

Sequence 



<213> OrganismNarae : Escherichia coli 0157 :H7 

20 <400> PreSequenceString : 

atgaataaat ccgttgtgtc aatttctgcg gcaatgttgg ttttactttg ccaaccggtc 60 

atggggagcg aaatctcacc cgcaacaccg tcagatgaag acaactacac ctttgacccg 12 0 

caactcttcc gcggcagcag atttagtcag tcgtcattag caaaactgac aacacgtgag 18 0 

tctgttgcac cgggcaatta taaaatggat atctacacca acaataagtt gtcaggcagt 240 

25 tggaatgtca cgtttaaaga agccgctgat ggtcgcgttc tgccctgcct gacgcctgaa 3 00 

gtcgcggacg cgataggcct caaaacaggg gaagataagg gggaaaaaga tcctgtctgt 3 60 

acgtttgcta aggaactcgc tcccggcatc accagccaga cacagttgtc acaattgcgc 420 

ctggacttat cggtgccaca gagtcaattg attagtcgcc ctcgcggcta tgttcccccc 480 

agcgagctgg ataccggagc atcgctggcg ttcatgaatt atattgccaa ctattacaac 540 

30 gttgcctatt cagggcagaa tgctcatagc cagcgttcgc tatgggcatc atttaatggt 60 0 

ggcatcaacc ttggtgcctg gcaatatcgt cagttatcca acatgacctg ggataatgac 660 

aaagggaatc agtggaacaa tattcgtagc tatttgcaac gcccgctgcc cgccataaat 720 

agccagttaa tgatggggca gcttatcacc agcggaagat ttttctctgg actcagttat 780 

cacggcgtta gtcfccgcgac cgatgaacgt atgctgccgg actccatgcg cggctatgcg 840 

35 ccgactattc gcggcgtggc cgcaacaaac gccagagtct cggtaatgca aaacggtcat 900 

gaaatatatc agaccaccgt ggctcctggc cctttcgaga taaacgacct ataccccacc 960 

agctacagcg gcgatctgga tgtcaccgtt acggaagcta acggcgcagt cagtcgtttc 1020 

agtgtcccct tttcagccgt accagaatcg atgcgtccag gaacttcccg ttataacgtg 1080 

gaagtaggta aaacgcagga tagtggtgat gactcgatgt ttggtgacct tacctggcag 1140 

40 cacgggatga ctaatacgct gacatttaac agtggttcgc gtatcgctga tggctaccag 12 00 

gcgctgatgc tgggcggagt ctatggcagt tcgctggggg catttggggc aaacctcact 1260 

tggtcccatg cgcgtgttcc cgaaagcgaa gcgcagagtg gttggatgtc gcaattaacc 1320 

tggagtaaaa ctttccagcc tacttcaacc accgtctccc tggcaggtta tcgatactct 1380 

accagcggct atcgtgatct ggctgatgtg ctgggagagc gtcatgctgc cagcaataaa 1440 

45 cagtcatggg actccagcca gtggcgtcaa cagtcgcgct tcgatcttac gttaagtcag 1500 

agccttgcga attacggcaa cctgtttgtg tcaggttcaa cacagaacta ccgtggcggc 1560 

aagagccgtg atacacagct tcagttaggt tacagcaata gctttagcca tggcatcagt 162 0 

atgaaccttt ccgtcggacg ccaaagaatg ggcggctata aagacaattc tgatgatatg 168 0 

cagacggtaa catccctttc attctcattc ccacttggcg gcaatggacc tcgtgtacca 1740 

50 agtcttagca acagctggac ccattcaact gacggtagct cgcaattaca aagctcgcta 18 0 0 

accggaatgc ttgatgaagc acagaccacc aactacagcc tgaacgtcat gcgcgatcaa 18 60 

caatataagc agacgacgct tagcggaaac atgcaaaaac gtttttcaca aactaccgtc 192 0 

ggattgaacg catcgaaggg ccaggattac tggcaggctt caggtaacgt acaaggcgcg 19 8 0 

atggctgtgc atggtggcgg cattactttc ggaccttatc tgggtgaaac gttcgccctg 2040 

55 gtcgaagcta aaggcgcaga aggtgcaaaa gtctataact ccagtcagct ggaaattaat 2100 

gacagtggct atgcgcttgt tccggcagta acgccctatc gctacaaccg tatatctctc 2160 

gatccacaag gaatggatgg cgatgccgag ttggtcgaca gtgaaagaca ggtagcaccg 2220 

gttgcgggtg cggcggtgaa agtaattttc cgtacccgtc ctggtaaagc gttgctgatt 22 8 0 

aaatcccgca tggcagatgg ttcggaactg ccaatgggag ccgatgtgct ggatgagaat 2340 

60 aatacagtcg tcggtatagc cggtcagggg gggcaaattt acctccgcac agaacagaca 24 00 

aaaggccact tgtcagttcg ctggggtgaa ggtgctaacg atagctgcca attgcccttt 2460 

gatatcagcg ggaaggacag caatagccct atcatccgcc tgaatgaaac ctgtcagtct 252 0 

tga 2523 
<212> Type : DNA 

65 <211> Length : 2523 

SequenceName : SEQ ID 424 
SequenceDescription : 
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Sequence 

<213> Organi smName : Escherichia coli 0157 :H7 
<40 0> PreSequenceString : 

atgaaactcg ccgcctgttt tctgacactc cttcctggct tcgccgttgc cgccagctgg SO 

acttctccgg ggttccctgc ctttagcgaa cagggaacgg gaacatttgt cagccacgcg 120 

cagttgccca aaggtacgcg tccactcacg ctaaattttg accagcagtg ctggcagcct 180 

gcagatgcga taaaactcaa tcagatgctt tccctgcaac cttgtagcaa cacgccgcct 240 

caatggcgat tgttcaggga cggcaaatat acgctgcaaa tagaoacccg ctccggtacg 300 

ccaacattga tgatttccat ccagaacgcc gccgaaccgg tagcaaacct ggtccgtgaa 3 60 

tgcccgaaat gggatggatt accgctcacg ctggatgtca gcgccacttt cccggaagga 42 0 

gccgccgtac gggattatta cagccagcaa attgcgatag tgaagaacgg tcaaataacg 48 0 

ttacaacccg ctgctaccag caacggttta. ctcctgctgg aacgggcaga aactgacgcc 540 

tctgcccctt tcgactggca taacgccacg gtttactttg tgctgacaga tcgtttcgaa 600 

aacggcgatc ccagtaatga ccagagttac ggacgtcata aagacggtat ggcggaaatt 660 

ggcacttttc acggcggcga tttacgcggc ctgaccaaca aactggatta cctccagcag 72 0 

ttgggcgtta atgctttatg gataagcgcc ccatttgagc aaattcacgg ctgggtcggc 780 

ggcggtacaa aaggcgattt cccgcattat gcctaccacg gttattacac acaggactgg 840 

acgaatcttg atgccaatat gggcaacgaa gccgatctac ggacgctggt tgatagcgca 900 

catcagcgcg gtattcgtat tctctttgat gtcgtgatga accacaccgg ctatgccacg 960 

ctggcggata tgcaggagta tcagtttggc gcgttatatc tttctggtga cgaagtgaaa 1020 

aaaacgctgg gtgaacgctg gagcgactgg aaacctgccg ccgggcaaac ctggcatagc 10 80 

tttaacgatt acattaattt cagcgacaaa acaggctggg ataaatggtg gggaaaaaac 1140 

tggatccgta ccgatatcgg cgattacgac aatcctggat tcgacgatct caccatgtcg 12 00 

ctagcctttt tgccggatat caaaaccgaa tcaactaccg cttctggtct gccggtgttc 1260 

tataaaaaca aaacggatac ccacgctaaa gccatcgacg gctttacccc tcgcgattac 13 2 0 

ttaacccact ggttaagtca gtgggtccgc gactatggga ttgatggttt tcgggtcgat 13 80 

accgccaaac atgttgagtt gcccgcttgg cagcaactga aaaccgaagc cagcgccgcg 1440 

cttcgcgaat ggaaaaaagc taaccccgac aaagcattag atgacaaacc tttctggatg 1500 

accggtgaag cctggggcca cggcgtgatg caaagtgact actatcgcca cggcttcgat 1560 

gcgatgatca atttcgatta tcaggagcag gcggcgaaag ctgtcgattg tattgcgcag 1620 

atggatacga cctggcagca aatggcggag aaattgcagg gtttcaacgt gttgagctac 1680 

ctctcgtcgc atgatacccg tctgttccgt gaagggggcg acaaagcagc agagttatta 1740 

ctattagcgc caggcgcggt acaaatcttt tatggcgatg aatcctcgcg tccgttcggt 1800 

cctacaggtt ctgatccgct gcaaggtaca cgttcggata tgaactggca ggatgttagc 1860 

ggtaaatctg ccgccaacgt cgcgcactgg cagaaaatca gccagttccg cgcccgccat 192 0 

cccgcaattg gcgcgggcaa acaaacgaca ctttcgctga agcagggcta cggctttgtt 1980 

cgtgagcatg gcgacgataa agtgctggtc atctgggctg ggcaacagtg a 2031 

<212> Type : DMA 
<211> Length : 2031 

SequenceNatne : SEQ ID 42 5 

SequenceDe script ion : 

Sequence 



<213> Organi smName : Escherichia coli 0157:H7 
<400> PreSequenceString : 

atgccacaac gacaccacca gggacataaa cgcacaccga aacagttggc gctcattatc 60 

aaacgctgtt tgccgatggt gctcactggc agcggcatgc tttgcactac cgctaacgcc 120 

gaagagtatt atttcgaccc cattatgctg gaaaccacaa aaagtggtat gcaaacaacc 180 

gatctgtcac gtttttcaaa aaaatacgca caactaccag gaacttatca ggttgatatc 240 

tggctgaata aaaagaaggt ttcacagaaa aaaattacat ttaccgccaa tgcagagcaa 3 00 

cttctgcagc cacagtttac ggtagaacaa ctacgtgagc tgggtattaa ggtggatgaa 3 60 

atcccggcgc tggctgaaaa agatgacgat agcgtgatca actcgcttga acaaatcatt 420 

cccggtacag ctgctgaatt tgatttcaat catcagcgac ttaatttgag cattccccaa 4 80 

attgcactgt accgtgatgc aagaggttac gtctcccctt ctcgttggga cgatggtata 540 

ccaacgctgt ttaccaacta ctcgtttaca ggttctgata accgttaccg ccagggcaat 600 

cgtagccaac gacagtacct aaatatgcaa aatggtgcca attttggccc ctggcgatta 660 

cgtaactatt ctacgtggac acgcaacgat caggcgtcaa gctggaacac tatcagtagt 720 

tatttacaac gtgatatcaa ggcgttgaag tctcagttgc ttctgggaga aagcgccacc 780 

agcggcagta ttttttccag ctacaacttt actggcgtgc aactcgcttc cgacgataat 840 

atgttgccaa acagccagcg cggatttgcc ccaacggtac gcggtatcgc aaacagtagt 900 

gcaatcgtga ctatcaggca aaatggttat gtgatctatc aaagcaacgt gccagcgggt 960 

gcctttgaaa ttaacgatct ctacccctct tccaacagcg gcgatttaga agtcacgatt 1020 

gaagaaagtg acggtacgca acgtcgcttt atccagcctt attcttcatt acccatgatg 1080 
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cagcgacctg ggcatctaaa atatagcgcg 
agtgatagca aggaacccga atttgctgaa 
tttacgctgt atggcggcct gctcggttct 
ggcggcacac ttggcgcact gggcgcgttg 
5 ttcgataacc agcactcttt tcatggctat 
ccggaaacca acaccaatat cgctgtcagc 
agttttgatg aagccaatac ccgcaattgg 
caattcaaca tcagccagac aatafottgat 
caagactatt ggggcaataa cgagaaaaac 

10 caatggggaa ttggttacag cctgaattat 
gaccgcgcac tctctttgaa tctcagtatt 
gtttcctatc agatgaccag ccagaaagat 
ggctcactgc tggatgatgg tcgcctgagc 
aacaaccata acagtagcgt gaacgccagt 

15 ggatacagtt acggtaatga cagtagccaa 
atccatcctc atggtgtgac gctctcgcaa 
gctaacgggg catctggcgt gaggatacaa 
ggctatgcag tggttcctta tctcacaact 
acgcagctgc ccgataacgt cgatcttgaa 

20 ggtgcaatgg tagcggcgcg tttcaacgcc 
agcgatcgca acggtaaacc gttgcccttt 
caacaaagta tcgtcgatga gggcggcata 
caaagctgga ctgtacgctg gggaaatcag 
acaccggatfc cagaaccaac aacctctgta 

25 

<212> Type : DNA 

<211> Length : 2517 

SequenceName : SEQ ID 426 
SequenceDescription : 

30 

Sequence 



<213> OrganismName : Escherichia 
<400> PreSequenceString : 

35 atgatgttca gaaatagaat attactaata 
gggtgtcgta ctactgcatc attaaatatt 
gcgaatgaaa cfctcctttag taaaagtgtc 
acggataaaa tagtttataa aaatatccaa 
aatggcgaaa aattaaaggt taaaatagag 

40 aaatccagca atgcgcaggc agtattacct 
gattttactg gagaaagaaa atctacctgg 
ggcggtgagt catcgtcatc catcgatttt 
aactggtgtg tgaattatct caccagcaaa 
aatatttcct attatcctaa aaatacgacc 

45 gatgatatcg ccttgttcca gctcagaaat 
ggaacaatta cgttgaaatg tgataatctt 
atggttgtat atctttctag cagtgactta 
aaaacagata atggtgtagg gtttgtgttg 
gctgccatta aaatttcggc caacggcgat 

50 gataaaccag gagtttcatt aaatagcaac 
tatgtatatg atgaaaaaaa agttaaatct 
gtgaaatacg attaa 
<212> Type : DNA 
<211> Length : 1035 

55 SequenceName : SEQ ID 427 

SequenceDescription : 

Sequence 



60 <213> OrganismName : Escherichia 
<400> PreSequenceString : 
atgattaaaa aagcttcgct gctgacggcg 
caggatacca gcccggatac tctcgtcgtt 
actgtgcttg caccaaccac cgttgtgacg 

65 tcggttaatg atgtgctgcg ccgtcttccg 
ggtcagctct catctatttt tattcgcggt 
gatggcgtac gcctgaatct ggcggggggg 



accgctggac gctatcgcgc tgatgcaaac 1140 

gccacggcaa tatatggttt gaataatact 120 0 

gaagattatt atgcgctggg gatcggtatc 1260 

tcgatggata tcaacagagc tgacacccaa 1320 

caatggcgta cgcagtacat caaagatatc 138 0 

tactatcgct ataccaacga tggctatttt 1440 

gactataaca gtcgccaaaa aagtgaaatt 1500 

ggggtaagtc tgtatgcctc cggttcacag 1560 

aggaatatct ctgttggggt ttccggccag 162 0 

caatacagcc gctacactga tcaaaataat 1680 

ccgttagaac gctggttacc gcgtagccgg 1740 

cgcccaaccc aacatgaaat gcgtcttgat 180 0 

tatagtctgg aacaaagtct ggatgacgat 1860 

taccgttcac cttatggaac cttcagtgcc 1920 

tacaattacg gcgttaccgg cggcgtggtt 198 0 

tatctgggca acgcttttgc gcttattgat 2 040 

aactatccgg ggattgctac tgatcccttt 2100 

tatcaggaaa accgtctctc ggtagatact 2160 

caaacaacac agtttgtggt gcccaacaga 222 0 

aatatcggtt atcgcgtact tgttacagtc 22 80 

ggcgctcttg ccagcaacga tgatacgggg 2340 

ctatatctct ctgggatatc gagtaaatca 240 0 

gcagatcaac aatgtcagtt tgcttttagt 2460 

ttacaaggca cagcgcagtg ccattaa 2 517 



coli 0157:H7 

tttatattgt gggctaattt tacctgggct 60 

acagatggta ttaatgttgg ggagatttta 12 0 

gtgtttactg ggatatcttg tgatacgagc 180 

agtgattggg ttgaagttgg gccttttggt 240 

tctttaggta aaaccagcga cacaattggg 3 00 

tatgtggtta aaatagccag aggcacacct 360 

tttatttcag ataccgtgat tgcaaatatt 42 0 

tggttgggta tttgtaaggc attgaagttt 480 

ctggcggggg atacatttac gcttgggtta 540 

tgtaagcctg aaaacaccgt tataaaagta 600 

cagggaaaga ttgcggcgaa cagtaaggaa 660 

ttcggcgaca aaaaacaagc atcgcggaat 720 

gttaaaggaa gtaatactat tttgcgtggt 78 0 

gatctaacag aaccaccaaa agggactgag 840 

cagggcgcgg cgacatcatt atggaaaaca 900 

attattaata taccagtcat ggccagttac 960 

ggcgcactgg aagcaaccgc attaatcaac 102 0 

1035 



COli 0157:H7 

tgttctgtca cagccttttc cgcttgggca 6 0 

actgctaacc gttttgaaca gccgcgcagc 12 0 

cgtcaggata tcgaccgctg gcagtcgacc 18 0 

ggcgtcgata tcacccaaaa cggcggttca 240 

acaaatgcca gtcatgtgtt ggtgttaatt 3 00 

agtggttctg ccgaccttag ccagttccct 360 
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attgcgcttg tccagcgtgt tgaatatatc cgtgggccac gctccgccgt ttatggttcc 420 

gatgcaatag gtggggtggt gaatatcatc acgacgcgcg atgaacccgg aacggaaatt 480 

tcagcagggt ggggaagcaa tagttatcaa aactatgatg tctctacaca gcaacaactg 540 

ggggataaga cacgagtaac gttgttgggc gattatgccc atactcatgg ttatgatgtt 600 

5 gttgcctatg gtaataccgg aacgcaagcg cagccagata acgatggttt tttaagtaaa 660 

acgctttatg gcgcgctgga gcataacttt actgatgcct ggagcggctt tgtgcgcggc 72 0 

tatggctatg ataaccgtac caattatgac gcgtattatt ctccgggttc accattggtc 780 

gatacccgta aactctatag tcaaagttgg gacgccgggc tgcgatataa cggcgaactg 840 

attaaatcac aactcattac cagctatagc catagcaaag attacaacta cgatccccat 900 

10 tatggtcgtt atgattcgtc ggcgacgctc gatgagatga agcaatacac cgtccagtgg 960 

gcaaacaaca tcatcattgg ccacggtaat gttggtgcgg gtgttgactg gcagaagcag 102 0 

agcacggcac cgggcacagc ttatgttaag gatggatatg atcaacgtaa taccggcatc 10 80 

tatctgaccg ggctgcaaca agtcggcgat tttacctttg aaggcgcagc acgcagcgac 1140 

gataactcac agtttggtcg .tcatggaacc tggcaaacca gcgccggttg ggaattcatc 1200 

15 gaaggttatc gcttcattgc ttcctacggg acatcttata aggcaccaaa tctggggcaa 1260 

ctgtatggct tctacggaaa tccgaatctg gacccggaga aaagcaaaca gtgggaaggc 1320 

gcgtttgaag gcttaaccgc tggggtgaac tggcgtattt ccggatatcg taacgatgtc 13 80 

agtgacttga tcgattatga tgatcacacc ctgaaatatt acaacgaagg gaaagcgcgg 1440 

attaagggcg tcgaggcgac cgccaatttt gataccggac cactgacgca tactgtgagt 1500 

20 tatgattatg tcgatgcgcg caatgcaatt accgacacgc cgttgttacg ccgtgctaaa 1560 

cagcaggtga aataccagct cgactggcag ttgtatgact tcgactgggg tattacttat 162 0 

cagtatttag gcactcgcta tgataaggat tactcatctt atccttatca aaccgttaaa 1680 

atgggcggtg tgagcttgtg ggatcttgcg gttgcgtatc cggtcacctc tcacctgaca 1740 

gttcgtggta aaatagccaa cctgttcgac aaagattatg agacagtcta tggctaccaa 1800 

25 actgcaggac gggaatacac cttgtctggc agctacacct tctga 1845 
<212> Type : DNA 
<211> Length : 1845 

SequenceName : SEQ ID 428 
SequenceDescription : 



30 



Sequence 



<213> OrganisitiNarae : Escherichia coli 0157:H7 
<400> PreSequenceString : 

35 atgaaaaaca aattgttatt tatgatgtta acaatactgg gtgcgcctgg gattgcagcc 60 
gcagcaggtt atgatttagc taattcagaa tataacttcg cggtaaatga attgagtaag 12 0 

tcttcattta atcaggcagc cataattggt caagctggga ctaataatag tgctcagtta 180 
cggcagggag gctcaaaact tttggcggtt gttgcgcaag aaggtagtag caaccgggca 240 
aagattgacc agacaggaga ttataacctt gcatatattg atcaggcggg cagtgccaat 3 00 

40 gatgccagta tttcgcaagg tgcttatggt aatactgcga tgattatcca gaaaggttct 360 
ggtaataaag caaatattac acagtatggt actcaaaaaa cggcaattgt agtgcagaga 42 0 

cagtcgcaaa tggctattcg cgtgacacaa cgttaa 456 
<212> Type : DNA 
<211> Length : 456 

45 SequenceName : SEQ ID 429 

SequenceDescription : 



Sequence 



50 <213> OrganismName : Escherichia 
<40 0> PreSequenceString : 
atgaacattt ttgcatattt actggtactt 
tttgccagcg tggtaatgac cggaacccgt 
accatccagt tgcgaaatac cagcgatcag 

55 gaacgtggtt ctgacaagaa tgtaccgttt 
gctgccgcag gtcaggcgtt acgcctgctc 
gagtctgttt tctggtttag tttcagtcaa 
cagaaccagc tcatcctggc cctgactaat 
attgtcggta aatccagtga cgcacccaaa 

60 attgaagtga cgaatcccac gggctattac 
aatggtaaaa aagtccccct cgcgaattcg 
tggacactac cctctggcat cagtgtcgct 
aacgactatg gcgtaaatgt tacgtctgag 
<212> Type : DNA 

65 <211> Length : 702 

SequenceName : SEQ ID 430 
SequenceDescription : 



COli 0157:H7 






gtattttcca 


tgagcatgag 


cagcagcgcg 


60 


attattttcc 


ctggtgacgc 


aaaggaaaaa 


120 


ccctatatca 


ttaatatcca 


tgttgaggat 


180 


atgccaaccc 


cgcagacatt 


tcgcatggaa 


240 


tacactggta 


ataatttacc 


gcaggatcgc 


300 


ctaccttatc 


tgaataagaa 


tgataaaagt 


360 


cgagtcaaaa 


ttttctatcg 


tcccagctcg 


420 


aacctgactt 


accaggtaaa 


acagaaccgc 


480 


gtcacaattc 


gcgccgctga 


actgcttaat 


540 


gtaatgattg 


ctcctcaaag 


cacaactgaa 


600 


cccggtgcgc 


agatccattt 


agtgaccgtc 


660 


catgccttat 


aa 




702 



